Structure-Aware 3D Hand Pose Regression from a Single Depth Image

Malik, Jameel; Elhayek, Ahmed; Stricker, Didier

doi:10.1007/978-3-030-01790-3_1

Jameel Malik^18,19,
Ahmed Elhayek¹⁸ &
Didier Stricker¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11162))

Included in the following conference series:

International Conference on Virtual Reality and Augmented Reality

2213 Accesses
14 Citations

Abstract

Hand pose tracking in 3D is an essential task for many virtual reality (VR) applications such as games and manipulating virtual objects with bare hands. CNN-based learning methods achieve the state-of-the-art accuracy by directly regressing 3D pose from a single depth image. However, the 3D pose estimated by these methods is coarse and kinematically unstable due to independent learning of sparse joint positions. In this paper, we propose a novel structure-aware CNN-based algorithm which learns to automatically segment the hand from a raw depth image and estimate 3D hand pose jointly with new structural constraints. The constraints include fingers lengths, distances of joints along the kinematic chain and fingers inter-distances. Learning these constraints help to maintain a structural relation between the estimated joint keypoints. Also, we convert sparse representation of hand skeleton to dense by performing n-points interpolation between the pairs of parent and child joints. By comprehensive evaluation, we show the effectiveness of our approach and demonstrate competitive performance to the state-of-the-art methods on the public NYU hand pose dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416 (2017)
Creative: Senz3D interactive gesture camera, March 2018. https://us.creative.com/p/web-cameras/creative-senz3d
Dibra, E., Wolf, T., Oztireli, C., Gross, M.: How to refine 3D hand pose estimation from unlabelled depth data? In: 3DV (2017)
Google Scholar
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593–3601 (2016)
Google Scholar
Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: ICIP (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Li, P., Ling, H., Li, X., Liao, C.: 3D hand pose estimation using randomized decision forest with segmentation index points. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 819–827 (2015)
Google Scholar
Malik, J., Elhayek, A., Stricker, D.: Simultaneous hand pose and skeleton bone-lengths estimation from a single depth image. In: 3DV (2017)
Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. arXiv preprint arXiv:1711.07399 (2017)
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of International Conference on Computer Vision (ICCV), vol. 10 (2017)
Google Scholar
Neverova, N., Wolf, C., Nebout, F., Taylor, G.W.: Hand pose estimation through semi-supervised and weakly-supervised learning. Comput. Vis. Image Underst. 164, 56–67 (2017)
Article Google Scholar
Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: ICCV Workshop, vol. 840, p. 2 (2017)
Google Scholar
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. In: CVWW (2015)
Google Scholar
Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3316–3324 (2015)
Google Scholar
Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. arXiv preprint arXiv:1712.03866 (2017)
Rad, M., Oberweger, M., Lepetit, V.: Feature mapping for learning fast and accurate 3D pose inference from synthetic images. arXiv preprint arXiv:1712.03904 (2017)
Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642. ACM (2015)
Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)
Google Scholar
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)
Google Scholar
Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 824–832 (2015)
Google Scholar
Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: data, methods, and challenges. In: IEEE International Conference on Computer Vision, pp. 1868–1876 (2015)
Google Scholar
Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., Pauly, M.: Robust articulated-icp for real-time hand tracking. In: Computer Graphics Forum, vol. 34, pp. 101–114. Wiley Online Library (2015)
Google Scholar
Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)
Google Scholar
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)
Article Google Scholar
Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Google Scholar
Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation. arXiv preprint arXiv:1711.08996 (2017)
Wan, C., Yao, A., Van Gool, L.: Hand pose estimation from local surface normals. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 554–569. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_34
Chapter Google Scholar
Wang, G., Chen, X., Guo, H., Zhang, C.: Region ensemble network: towards good practices for deep 3D hand pose estimation. J. Vis. Commun. Image Represent. (2018)
Google Scholar
Xu, C., Govindarajan, L.N., Zhang, Y., Cheng, L.: Lie-x: depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int. J. Comput. Vis. 123, 454–478 (2017)
Article MathSciNet Google Scholar
Ye, Q., Yuan, S., Kim, T.-K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 346–361. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_21
Chapter Google Scholar
Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: IEEE CVPR (2018)
Google Scholar
Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI (2016)
Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: International Conference on Computer Vision (2017)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the Federal Ministry of Education and Research of the Federal Republic of Germany as part of the research projects DYNAMICS (Grant number 01IW15003) and VIDETE (Grant number 01IW18002).

Author information

Authors and Affiliations

Department Augmented Vision, DFKI Kaiserslautern, Kaiserslautern, Germany
Jameel Malik, Ahmed Elhayek & Didier Stricker
NUST-SEECS, Islamabad, Pakistan
Jameel Malik

Authors

Jameel Malik
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Elhayek
View author publications
You can also search for this author in PubMed Google Scholar
Didier Stricker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jameel Malik , Ahmed Elhayek or Didier Stricker .

Editor information

Editors and Affiliations

University of Paris-Sud, Orsay, France
Patrick Bourdot
University of Nottingham, Nottingham, UK
Sue Cobb
University of Minnesota, Minneapolis, MN, USA
Victoria Interrante
Nara Institute of Science and Technology, Ikoma, Japan
Hirokazu kato
University of Kaiserslautern and DFKI, Kaiserslautern, Germany
Didier Stricker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malik, J., Elhayek, A., Stricker, D. (2018). Structure-Aware 3D Hand Pose Regression from a Single Depth Image. In: Bourdot, P., Cobb, S., Interrante, V., kato, H., Stricker, D. (eds) Virtual Reality and Augmented Reality. EuroVR 2018. Lecture Notes in Computer Science(), vol 11162. Springer, Cham. https://doi.org/10.1007/978-3-030-01790-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-01790-3_1
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01789-7
Online ISBN: 978-3-030-01790-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics