A flexible technique to select objects via convolutional neural network in VR space

Li, Huiyu; Fan, Linwei

doi:10.1007/s11432-019-1517-3

A flexible technique to select objects via convolutional neural network in VR space

Research Paper
Published: 23 December 2019

Volume 63, article number 112101, (2020)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Huiyu Li¹ &
Linwei Fan^2,3

138 Accesses
5 Citations
Explore all metrics

Abstract

Most studies on the selection techniques of projection-based VR systems are dependent on users wearing complex or expensive input devices, however there are lack of more convenient selection techniques. In this paper, we propose a flexible 3D selection technique in a large display projection-based virtual environment. Herein, we present a body tracking method using convolutional neural network (CNN) to estimate 3D skeletons of multi-users, and propose a region-based selection method to effectively select virtual objects using only the tracked fingertips of multi-users. Additionally, a multi-user merge method is introduced to enable users’ actions and perception to realign when multiple users observe a single stereoscopic display. By comparing with state-of-the-art CNN-based pose estimation methods, the proposed CNN-based body tracking method enables considerable estimation accuracy with the guarantee of real-time performance. In addition, we evaluate our selection technique against three prevalent selection techniques and test the performance of our selection technique in a multi-user scenario. The results show that our selection technique significantly increases the efficiency and effectiveness, and is of comparable stability to support multi-user interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Cruz-Neira C, Sandin D J, DeFanti T A, et al. The CAVE: audio visual experience automatic virtual environment. Commun ACM, 1992, 35: 64–72
Article Google Scholar
Rademacher P, Bishop G. Multiple-center-of-projection images. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, Orlando, 1998. 199–206
Simon A, Smith R C, Pawlicki R R. Omnistereo for panoramic virtual environment display systems. In: Proceedings of IEEE Annual International Symposium on Virtual Reality, Chicago, 2004. 67
van de Pol R, Ribarsky W, Hodges L, et al. Interaction techniques on the virtual workbench. In: Virtual Environments’99. Vienna: Springer, 1999. 157–168
Google Scholar
Banerjee A, Burstyn J, Girouard A, et al. Multipoint: comparing laser and manual pointing as remote input in large display interactions. Int J Human-Comput Studies, 2012, 70: 690–702
Article Google Scholar
Myers B A, Bhatnagar R, Nichols J, et al. Interacting at a distance: measuring the performance of laser pointers and other devices. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, Minneapolis, Minnesota, 2002. 33–40
Polacek O, Klima M, Sporka A J, et al. A comparative study on distant free-hand pointing. In: Proceedings of European Conference on Interactive Tv and Video, Berlin, 2012. 139–142
Nancel M, Wagner J, Pietriga E, et al. Mid-air pan-and-zoom on wall-sized displays. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, Vancouver, 2011. 177–186
Brown M A, Stuerzlinger W. Exploring the throughput potential of in-air pointing. In: Proceedings of International Conference on Human-Computer Interaction, Toronto, 2016. 13–24
Ortega M, Nigay L. Airmouse: finger gesture for 2D and 3D interaction. In: Proceedings of IFIP International Conference on Human-Computer Interaction, Uppsala, 2009. 214–227
Vogel D, Balakrishnan R. Distant freehand pointing and clicking on very large, high resolution displays. In: Proceedings of ACM Symposium on User Interface Software and Technology, Seattle, 2005. 33–42
Kim K, Choi H. Depth-based real-time hand tracking with occlusion handling using kalman filter and dam-shift. In: Proceedings of Asian Conference on Computer Vision, Singapore, 2014. 218–226
Zohra F T, Rahman M W, Gavrilova M. Occlusion detection and localization from Kinect depth images. In: Proceedings of International Conference on Cyberworlds, Chongqing, 2016. 189–196
Wu C J, Quigley A, Harris-Birtill D. Out of sight: a toolkit for tracking occluded human joint positions. Pers Ubiquit Comput, 2017, 21: 125–135
Article Google Scholar
Wei S E, Ramakrishna V, Kanade T, et al. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 4724–4732
Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2017. 7291–7299
Insafutdinov E, Pishchulin L, Andres B, et al. Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 34–50
Iqbal U, Gall J. Multi-person pose estimation with local joint-to-person associations. In: Proceedings of European Conference on Computer Vision Workshops, Crowd Understanding, 2016. 627–642
Fang H S, Xie S Q, Tai Y W, et al. Rmpe: regional multi-person pose estimation. In: Proceedings of International Conference on Computer Vision, 2017. 2334–2343
Bolas M, McDowall I, Corr D. New research and explorations into multiuser immersive display systems. IEEE Comput Grap Appl, 2004, 24: 18–21
Article Google Scholar
Simon A. Usability of multiviewpoint images for spatial interaction in projection-based display systems. IEEE Trans Visual Comput Graph, 2007, 13: 26–33
Article Google Scholar
Matulic F, Vogel D. Multiray: multi-finger raycasting for large displays. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Montreal, 2018. 1–13
Ramanan D, Forsyth D A, Zisserman A. Strike a pose: tracking people by finding stylized poses. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, 2005. 271–278
Jain A. Articulated people detection and pose estimation: reshaping the future. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, 2012. 3178–3185
Pishchulin L, Insafutdinov E, Tang S Y, et al. Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 4929–4937
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 770–778
Liang J D, Green M. JDCAD: a highly interactive 3D modeling system. Comput Graph, 1994, 18: 499–506
Article Google Scholar
de Haan G, Koutek M, Post F H. Intenselect: using dynamic object rating for assisting 3D object selection. In: Proceedings of Eurographics Conference on Virtual Environments, Aalborg, 2005. 201–209
Steed A, Parker C. 3D selection strategies for head tracked and non-head tracked operation of spatially immersive displays. In: Proceedings of the 8th International Immersive Projection Technology, Workshop, 2004. 13–14
Grossman T, Balakrishnan R. The bubble cursor:enhancing target acquisition by dynamic resizing of the cursor’s activation area. In: Proceedings of Conference on Human Factors in Computing Systems, Portland, 2005. 281–290
Vanacken L, Grossman T, Coninx K. Exploring the effects of environment density and target visibility on object selection in 3D virtual environments. In: Proceedings of IEEE Symposium on 3D User Interfaces, Charlotte, 2007. 115–122
Frees S, Kessler G D, Kay E. PRISM interaction for enhancing control in immersive virtual environments. ACM Trans Comput-Hum Interact, 2007, 14: 369–374
Article Google Scholar
Kopper R, Bowman D A, Silva M G, et al. A human motor behavior model for distal pointing tasks. Int J Human—Comput Studies, 2010, 68: 603–615
Article Google Scholar
Forlines C, Balakrishnan R, Beardsley P, et al. Zoom-and-pick: facilitating visual zooming and precision pointing with interactive handheld projectors. In: Proceedings of ACM Symposium on User Interface Software and Technology, Seattle, 2005. 73–82
Kopper R, Bacim F, Bowman D A. Rapid and accurate 3D selection by progressive refinement. In: Proceedings of IEEE Symposium on 3D User Interfaces, Washington, 2011. 67–74
Shen Y J, Hao Z H, Wang P F, et al. A novel human detection approach based on depth map via Kinect. In_ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, 2013. 535–541
Kuang H, Cai S Q, Ma X L, et al. An effective skeleton extraction method based on Kinect depth image. In: Proceedings of International Conference on Measuring Technology and Mechatronics Automation, Changsha, 2018. 187–190
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference of Learning Representation, San Diego, 2015. 1–14
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 1097–1105
Rosenberg L B. The effect of interocular distance upon operator performance using stereoscopic displays to perform virtual depth tasks. In: Proceedings of IEEE Virtual Reality Annual International Symposium, Washington, 1993. 27–32
Andriluka M, Pishchulin L, Gehler P, et al. 2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, 2014. 3686–3693
Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 740–755
Argelaguet F, Andujar C. A survey of 3D object selection techniques for virtual environments. Comput Graph, 2013, 37: 121–136
Article Google Scholar
Kulik A, Kunert A, Beck S, et al. C1x6: a stereoscopic six-user display for co-located collaboration in shared virtual environments. ACM Trans Graph, 2011, 30: 1–12
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Jinan, 250101, China
Huiyu Li
Shandong Province Key Lab of Digital Media Technology, Shandong University of Finance and Economics, Jinan, 250061, China
Linwei Fan
Shandong Co-Innovation Center of Future Intelligent Computing, Shandong Technology and Business University, Yantai, 264005, China
Linwei Fan

Authors

Huiyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Linwei Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huiyu Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Fan, L. A flexible technique to select objects via convolutional neural network in VR space. Sci. China Inf. Sci. 63, 112101 (2020). https://doi.org/10.1007/s11432-019-1517-3

Download citation

Received: 08 April 2019
Revised: 17 June 2019
Accepted: 01 August 2019
Published: 23 December 2019
DOI: https://doi.org/10.1007/s11432-019-1517-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A flexible technique to select objects via convolutional neural network in VR space

Abstract

Access this article

Similar content being viewed by others

Learning Markerless Human Pose Estimation from Multiple Viewpoint Video

Learning Visibility for Robust Dense Human Body Estimation

VTP: volumetric transformer for multi-view multi-person 3D pose estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A flexible technique to select objects via convolutional neural network in VR space

Abstract

Access this article

Similar content being viewed by others

Learning Markerless Human Pose Estimation from Multiple Viewpoint Video

Learning Visibility for Robust Dense Human Body Estimation

VTP: volumetric transformer for multi-view multi-person 3D pose estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation