Efficient Human Pose Estimation from Single Depth Images

Shotton, J.; Girshick, R.; Fitzgibbon, A.; Sharp, T.; Cook, M.; Finocchio, M.; Moore, R.; Kohli, P.; Criminisi, A.; Kipman, A.; Blake, A.

doi:10.1007/978-1-4471-4929-3_13

J. Shotton³,
R. Girshick⁴,
A. Fitzgibbon³,
T. Sharp³,
M. Cook³,
M. Finocchio⁵,
R. Moore⁶,
P. Kohli³,
A. Criminisi³,
A. Kipman⁵ &
…
A. Blake³

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

7665 Accesses
35 Citations

Abstract

We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image, without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, and field-of-view cropping. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features, and parallelizable decision forests, both approaches can run super-realtime on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Parts of this chapter are reprinted, with permission, from Shotton et al., Proc IEEE Conf. Computer Vision and Pattern Recognition (CVPR) (2011), © 2011 IEEE.

This work was undertaken at Microsoft Research, Cambridge, in collaboration with Xbox. See http://research.microsoft.com/vision/. Ross Girshick is currently a postdoctoral fellow at UC Berkeley.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use K to indicate the maximum number of relative votes allowed. In practice we allow some leaf nodes to store fewer than K votes for some joints.
2.
Recall that for notational simplicity we are assuming u defines a pixel 2D position in a particular image; the ground truth joint positions P will therefore correspond for each particular image.
3.
This threshold could equivalently be applied at test time though would waste memory in the tree.
4.
The results for ojr at 300k images were so compelling we chose not to expend the considerable energy in training a directly comparable 900k forest.

References

Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24
Google Scholar
Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3D human pose annotations. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Bregler C, Malik J (1998) Tracking people with twists and exponential maps. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Brubaker MA, Fleet DJ, Hertzmann A (2010) Physics-based person tracking using the anthropomorphic walker. Int J Comput Vis
Google Scholar
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5)
Google Scholar
Criminisi A, Shotton J, Robertson D, Konukoglu E (2010) Regression forests for efficient anatomy detection and localization in CT studies. In: MICCAI workshop on medical computer vision: recognition techniques and applications in medical imaging, Beijing. Springer, Berlin
Google Scholar
Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3)
Google Scholar
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Gall J, Lempitsky V (2009) Class-specific Hough forests for object detection. IEEE Trans Pattern Anal Mach Intell
Google Scholar
Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera. In: Proc IEEE conf computer vision and pattern recognition (CVPR). IEEE, New York
Google Scholar
Girshick R, Shotton J, Kohli P, Criminisi A, Fitzgibbon A (2011) Efficient regression of general-activity human poses from depth images. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Grest D, Woetzel J, Koch R (2005) Nonlinear body pose estimation from depth images. In: Proc annual symposium of the German association for pattern recognition (DAGM)
Google Scholar
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2)
Google Scholar
Knoop S, Vacek S, Dillmann R (2006) Sensor fusion for 3D human body tracking with an articulated 3D body model. In: Proc IEEE intl conf on robotics and automation (ICRA)
Google Scholar
Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1–3)
Google Scholar
Lepetit V, Lagger P, Fua P (2005) Randomized trees for real-time keypoint recognition. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Microsoft Corporation Kinect for Windows and Xbox 360
Google Scholar
Müller J, Arens M (2010) Human pose estimation with implicit shape models. In: ARTEMIS
Google Scholar
Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images. In: Proc IEEE intl conf on robotics and automation (ICRA)
Google Scholar
Sharp T (2008) Implementing decision trees and forests on a GPU. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Google Scholar
Shotton J, Winn J, Rother C, Criminisi A (2006) TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Google Scholar
Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Finocchio M, Moore R, Kohli P, Criminisi A, Kipman A, Blake A (2012) Efficient human pose estimation from single depth images. IEEE Trans Pattern Anal Mach Intell
Google Scholar
Siddiqui M, Medioni G (2010) Human pose estimation from a single view point, real-time range sensor. In: CVCG at CVPR
Google Scholar
Sigal L, Bhatia S, Roth S, Black MJ, Isard M (2004) Tracking loose-limbed people. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Urtasun R, Darrell T (2008) Local probabilistic regression for activity-independent human pose inference. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1)
Google Scholar
Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. In: Proc ACM SIGGRAPH
Google Scholar
Winn J, Shotton J (2006) The layout consistent random field for recognizing and segmenting partially occluded objects. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Zhu Y, Fujimura K (2007) Constrained optimization for human pose estimation from depth sequences. In: Proc Asian conf on computer vision (ACCV)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, UK
J. Shotton, A. Fitzgibbon, T. Sharp, M. Cook, P. Kohli, A. Criminisi & A. Blake
University of California, Berkeley, CA, USA
R. Girshick
Microsoft Corporation, Redmond, WA, USA
M. Finocchio & A. Kipman
ST-Ericsson, Redmond, WA, USA
R. Moore

Authors

J. Shotton
View author publications
You can also search for this author in PubMed Google Scholar
R. Girshick
View author publications
You can also search for this author in PubMed Google Scholar
A. Fitzgibbon
View author publications
You can also search for this author in PubMed Google Scholar
T. Sharp
View author publications
You can also search for this author in PubMed Google Scholar
M. Cook
View author publications
You can also search for this author in PubMed Google Scholar
M. Finocchio
View author publications
You can also search for this author in PubMed Google Scholar
R. Moore
View author publications
You can also search for this author in PubMed Google Scholar
P. Kohli
View author publications
You can also search for this author in PubMed Google Scholar
A. Criminisi
View author publications
You can also search for this author in PubMed Google Scholar
A. Kipman
View author publications
You can also search for this author in PubMed Google Scholar
A. Blake
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, United Kingdom
A. Criminisi
Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, United Kingdom
J. Shotton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shotton, J. et al. (2013). Efficient Human Pose Estimation from Single Depth Images. In: Criminisi, A., Shotton, J. (eds) Decision Forests for Computer Vision and Medical Image Analysis. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4929-3_13

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4929-3_13
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4928-6
Online ISBN: 978-1-4471-4929-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics