Learning Hough Transform with Latent Structures for Joint Object Detection and Pose Estimation

Li, Hanxi; He, Xuming; Barnes, Nick; Wang, Mingwen

doi:10.1007/978-3-319-27674-8_11

Hanxi Li^19,20,
Xuming He²⁰,
Nick Barnes²⁰ &
…
Mingwen Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9517))

Included in the following conference series:

International Conference on Multimedia Modeling

1729 Accesses
3 Citations

Abstract

We present a novel max-margin Hough transform with latent structure for joint object detection and pose estimation. Our method addresses the large appearance and shape variation of objects in multiple poses by integrating three key components: First, we propose a more robust appearance model by designing a patch dictionary with complementary features; In addition, we use a group of latent components to explicitly incorporate feature selection and pooling into the Hough-based object models; Furthermore, we adopt a multiple instance learning approach to handle the lack of correspondence among training instances with noisy bounding-box labels. We design a unified objective and an efficient approximate inference that alternates the search between object location and pose space. We demonstrate the efficacy of our approach by achieving the state-of-the-art performance on two detection and two joint estimation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Google Scholar
Wan, L., Eigen, D., Fergus, R.: End-to-end integration of a convolution network, deformable parts model and non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 851–859 (2015)
Google Scholar
Zia, M.Z., Stark, M., Schindler, K.: Explicit occlusion modeling for 3d object class representations. In: CVPR (2013)
Google Scholar
Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: CVPR (2009)
Google Scholar
Yarlagadda, P., Monroy, A., Ommer, B.: Voting by grouping dependent parts. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 197–210. Springer, Heidelberg (2010)
Chapter Google Scholar
Razavi, N., Gall, J., Kohli, P., Van Gool, L.: Latent hough transform for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 312–325. Springer, Heidelberg (2012)
Chapter Google Scholar
Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. TPAMI 33(11), 2188–2202 (2011)
Article Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1), 259–289 (2008)
Article Google Scholar
Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: CVPR (2009)
Google Scholar
Razavi, N., Gall, J., Van Gool, L.: Backprojection revisited: scalable multi-view object detection and similarity metrics for detections. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 620–633. Springer, Heidelberg (2010)
Chapter Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 512–519. IEEE (2014)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS (2002)
Google Scholar
Hejrati, M., Ramanan, D.: Analyzing 3d objects in cluttered images. In: NIPS (2012)
Google Scholar
Ballard, D.: Generalizing the hough transform to detect arbitrary shapes. PR 13(2), 111–122 (1981)
MATH Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)
Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1–3), 259–289 (2008)
Article Google Scholar
Ommer, B., Malik, J.: Multi-scale object detection by clustering lines. In: CVPR (2009)
Google Scholar
Payet, N., Todorovic, S.: From contours to 3d object detection and pose estimation. In: ICCV (2011)
Google Scholar
Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: ICCV (2009)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660. IEEE (2014)
Google Scholar
Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 (voc 2007) results (2007) 11 (2008)
Google Scholar
Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)
Google Scholar
Berg, A.C., Malik, J.: Geometric blur for template matching. In: CVPR (2001)
Google Scholar
Zhang, Y., Chen, T.: Implicit shape kernel for discriminative learning of the hough transform detector. In: BMVC (2010)
Google Scholar

Download references

Acknowledgement

This work is supported by National Natural Science Foundation of China (Project NO: 61503168).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
Hanxi Li & Mingwen Wang
National ICT Australia, Canberra Research Laboratory, Canberra, Australia
Hanxi Li, Xuming He & Nick Barnes

Authors

Hanxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuming He
View author publications
You can also search for this author in PubMed Google Scholar
Nick Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Mingwen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanxi Li .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., He, X., Barnes, N., Wang, M. (2016). Learning Hough Transform with Latent Structures for Joint Object Detection and Pose Estimation. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9517. Springer, Cham. https://doi.org/10.1007/978-3-319-27674-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-27674-8_11
Published: 01 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27673-1
Online ISBN: 978-3-319-27674-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics