ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

Brahmbhatt, Samarth; Tang, Chengcheng; Twigg, Christopher D.; Kemp, Charles C.; Hays, James

doi:10.1007/978-3-030-58601-0_22

Samarth Brahmbhatt ORCID: orcid.org/0000-0002-3732-8865¹²,
Chengcheng Tang¹⁴,
Christopher D. Twigg¹⁴,
Charles C. Kemp¹² &
…
James Hays^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12358))

Included in the following conference series:

European Conference on Computer Vision

3596 Accesses
57 Citations

Abstract

Grasping is natural for humans. However, it involves complex hand configurations and soft tissue deformation that can result in complicated regions of contact between the hand and the object. Understanding and modeling this contact can potentially improve hand models, AR/VR experiences, and robotic grasping. Yet, we currently lack datasets of hand-object contact paired with other data modalities, which is crucial for developing and evaluating contact modeling techniques. We introduce ContactPose, the first dataset of hand-object contact paired with hand pose, object pose, and RGB-D images. ContactPose has 2306 unique grasps of 25 household objects grasped with 2 functional intents by 50 participants, and more than 2.9 M RGB-D grasp images. Analysis of ContactPose data reveals interesting relationships between hand pose and contact. We use this data to rigorously evaluate various data representations, heuristics from the literature, and learning methods for contact modeling. Data, code, and trained models are available at https://contactpose.cc.gatech.edu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ballan, L., Taneja, A., Gall, J., Van Gool, L., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 640–653. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_46
Chapter Google Scholar
Bernardin, K., Ogawara, K., Ikeuchi, K., Dillmann, R.: A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models. IEEE Trans. Robot. 21(1), 47–57 (2005)
Article Google Scholar
Brahmbhatt, S., Ham, C., Kemp, C.C., Hays, J.: ContactDB: analyzing and predicting grasp contact via thermal imaging. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Brahmbhatt, S., Handa, A., Hays, J., Fox, D.: ContactGrasp: functional multi-finger grasp synthesis from contact. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2019)
Google Scholar
Bullock, I.M., Feix, T., Dollar, A.M.: The yale human grasping dataset: grasp, object, and task data in household and machine shop environments. Int. J. Robot. Res. 34(3), 251–255 (2015)
Article Google Scholar
Bullock, I.M., Zheng, J.Z., De La Rosa, S., Guertler, C., Dollar, A.M.: Grasp frequency and usage in daily household and machine shop tasks. IEEE Trans. Haptics 6(3), 296–308 (2013)
Article Google Scholar
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data, 10(1), 5:1–5:51 (2015). https://doi.org/10.1145/2733381.
Deimel, R., Brock, O.: A novel type of compliant and underactuated robotic hand for dexterous grasping. Int. J. Robot. Res. 35(1–3), 161–185 (2016)
Article Google Scholar
Ehsani, K., Tulsiani, S., Gupta, S., Farhadi, A., Gupta, A.: Use the force, luke! learning to predict physical forces by simulating effects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Feix, T., Romero, J., Schmiedmayer, H.B., Dollar, A.M., Kragic, D.: The grasp taxonomy of human grasp types. IEEE Trans. Hum.-Mach. Syst. 46(1), 66–77 (2015)
Article Google Scholar
Ferrari, C., Canny, J.: Planning optimal grasps. In: Proceedings IEEE International Conference on Robotics and Automation, pp. 2290–2295. IEEE (1992)
Google Scholar
Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Garon, M., Lalonde, J.F.: Deep 6-dof tracking. IEEE Trans. Vis. Comput. Graph. 23(11), 2410–2418 (2017)
Article Google Scholar
Glauser, O., Wu, S., Panozzo, D., Hilliges, O., Sorkine-Hornung, O.: Interactive hand pose estimation using a stretch-sensing soft glove. ACM Trans. Graph. (TOG) 38(4), 1–15 (2019)
Article Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: A papier-mâché approach to learning 3D surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 216–224 (2018)
Google Scholar
Hamer, H., Gall, J., Weise, T., Van Gool, L.: An object-dependent hand pose prior from sparse training data. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 671–678. IEEE (2010)
Google Scholar
Hamer, H., Schindler, K., Koller-Meier, E., Van Gool, L.: Tracking a hand manipulating an object. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1475–1482. IEEE (2009)
Google Scholar
Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3D annotation of hand and object poses. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11807–11816 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988, October 2017
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Homberg, B.S., Katzschmann, R.K., Dogar, M.R., Rus, D.: Haptic identification of objects using a modular soft robotic gripper. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1698–1705. IEEE (2015)
Google Scholar
Huber, P.J.: Robust Estimation of a location parameter. In: Kotz, S., Johnson, N.L., (eds) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY (1992) https://doi.org/10.1007/978-1-4612-4380-9_35
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8320–8329 (2018)
Google Scholar
Larsen, E., Gottschalk, S., Lin, M.C., Manocha, D.: Fast distance queries with rectangular swept sphere volumes. In: IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 4, pp. 3719–3726. IEEE (2000)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lu, Q., Chenna, K., Sundaralingam, B., Hermans, T.: Planning multi-fingered grasps as probabilistic inference in a learned deep network. In: International Symposium on Robotics Research (2017)
Google Scholar
Mahler, J., et al.: Learning ambidextrous robot grasping policies. Sci. Robot. 4(26), eaau4984 (2019)
Article Google Scholar
Mahler, J., et al.: Dex-net 1.0: a cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1957–1964. IEEE (2016)
Google Scholar
Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
Google Scholar
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Autom. Mag. 11(4), 110–122 (2004)
Article Google Scholar
Moon, G., Yong Chang, J., Mu Lee, K.: V2V-posenet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088 (2018)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. http://smpl-x.is.tue.mpg.de
Pham, T.H., Kheddar, A., Qammaz, A., Argyros, A.A.: Towards force sensing from vision: observing hand-object interactions to infer manipulation forces. In: Proceedings of the IEEE Conference on CComputer Vision and Pattern Recognition, pp. 2810–2819 (2015)
Google Scholar
Pham, T.H., Kyriazis, N., Argyros, A.A., Kheddar, A.: Hand-object contact force estimation from markerless visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2883–2896 (2018)
Article Google Scholar
Pollard, N.S.: Parallel methods for synthesizing whole-hand grasps from generalized prototypes. Tech. rep, MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1994)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp. 5099–5108 (2017)
Google Scholar
Rogez, G., Supancic, J.S., Ramanan, D.: Understanding everyday hands in action from rgb-d images. In: Proceedings of the IEEE international conference on computer vision, pp. 3889–3897 (2015)
Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (TOG) 36(6), 245 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)
Google Scholar
Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 294–310. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_19
Chapter Google Scholar
Sundaram, S., Kellnhofer, P., Li, Y., Zhu, J.Y., Torralba, A., Matusik, W.: Learning the signatures of the human grasp using a scalable tactile glove. Nature 569(7758), 698 (2019)
Article Google Scholar
SynTouch LLC: BioTac. https://www.syntouchinc.com/robotics/. Accessed 5 March 2020
Tekin, B., Bogo, F., Pollefeys, M.: H+ o: unified egocentric recognition of 3D hand-object poses and interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2019)
Google Scholar
Teschner, M., et al.: Collision detection for deformable objects. In: Computer Graphics Forum, vol. 24, pp. 61–81. Wiley Online Library (2005)
Google Scholar
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)
Article Google Scholar
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. In: Conference on Robot Learning (CoRL) (2018). https://arxiv.org/abs/1809.10790
Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. 118(2), 172–193 (2016)
Article MathSciNet Google Scholar
Wade, J., Bhattacharjee, T., Williams, R.D., Kemp, C.C.: A force and thermal sensing skin for robots in human environments. Robot. Auton. Syst. 96, 1–14 (2017)
Article Google Scholar
Ye, Y., Liu, C.K.: Synthesis of detailed hand manipulations using contact sampling. ACM Trans. Graph. (TOG) 31(4), 41 (2012)
Article Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ACM Trans. Graph. (TOG) 33(4), 1–10 (2014)
Google Scholar
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv:1801.09847 (2018)
Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4447–4455 (2015)
Google Scholar
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: Freihand: a dataset for markerless capture of hand pose and shape from single RGB images. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar

Download references

Acknowledgements

We are thankful to the anonymous reviewers for helping improve this paper. We would also like to thank Elise Campbell, Braden Copple, David Dimond, Vivian Lo, Jeremy Schichtel, Steve Olsen, Lingling Tao, Sue Tunstall, Robert Wang, Ed Wei, and Yuting Ye for discussions and logistics help.

Author information

Authors and Affiliations

Georgia Tech, Atlanta, GA, USA
Samarth Brahmbhatt, Charles C. Kemp & James Hays
Argo AI, Pittsburgh, USA
James Hays
Facebook Reality Labs, Pittsburgh, USA
Chengcheng Tang & Christopher D. Twigg

Authors

Samarth Brahmbhatt
View author publications
You can also search for this author in PubMed Google Scholar
Chengcheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. Twigg
View author publications
You can also search for this author in PubMed Google Scholar
Charles C. Kemp
View author publications
You can also search for this author in PubMed Google Scholar
James Hays
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samarth Brahmbhatt .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 54309 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brahmbhatt, S., Tang, C., Twigg, C.D., Kemp, C.C., Hays, J. (2020). ContactPose: A Dataset of Grasps with Object Contact and Hand Pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-58601-0_22
Published: 28 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics