Skip to main content

Learning Hough Transform with Latent Structures for Joint Object Detection and Pose Estimation

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9517))

Included in the following conference series:

Abstract

We present a novel max-margin Hough transform with latent structure for joint object detection and pose estimation. Our method addresses the large appearance and shape variation of objects in multiple poses by integrating three key components: First, we propose a more robust appearance model by designing a patch dictionary with complementary features; In addition, we use a group of latent components to explicitly incorporate feature selection and pooling into the Hough-based object models; Furthermore, we adopt a multiple instance learning approach to handle the lack of correspondence among training instances with noisy bounding-box labels. We design a unified objective and an efficient approximate inference that alternates the search between object location and pose space. We demonstrate the efficacy of our approach by achieving the state-of-the-art performance on two detection and two joint estimation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)

    Google Scholar 

  2. Wan, L., Eigen, D., Fergus, R.: End-to-end integration of a convolution network, deformable parts model and non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 851–859 (2015)

    Google Scholar 

  3. Zia, M.Z., Stark, M., Schindler, K.: Explicit occlusion modeling for 3d object class representations. In: CVPR (2013)

    Google Scholar 

  4. Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: CVPR (2009)

    Google Scholar 

  5. Yarlagadda, P., Monroy, A., Ommer, B.: Voting by grouping dependent parts. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 197–210. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Razavi, N., Gall, J., Kohli, P., Van Gool, L.: Latent hough transform for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 312–325. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. TPAMI 33(11), 2188–2202 (2011)

    Article  Google Scholar 

  8. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1), 259–289 (2008)

    Article  Google Scholar 

  9. Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: CVPR (2009)

    Google Scholar 

  10. Razavi, N., Gall, J., Van Gool, L.: Backprojection revisited: scalable multi-view object detection and similarity metrics for detections. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 620–633. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

    Google Scholar 

  12. Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 512–519. IEEE (2014)

    Google Scholar 

  13. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  14. Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS (2002)

    Google Scholar 

  15. Hejrati, M., Ramanan, D.: Analyzing 3d objects in cluttered images. In: NIPS (2012)

    Google Scholar 

  16. Ballard, D.: Generalizing the hough transform to detect arbitrary shapes. PR 13(2), 111–122 (1981)

    MATH  Google Scholar 

  17. Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)

    Google Scholar 

  18. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1–3), 259–289 (2008)

    Article  Google Scholar 

  19. Ommer, B., Malik, J.: Multi-scale object detection by clustering lines. In: CVPR (2009)

    Google Scholar 

  20. Payet, N., Todorovic, S.: From contours to 3d object detection and pose estimation. In: ICCV (2011)

    Google Scholar 

  21. Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011)

    Google Scholar 

  22. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  23. Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: ICCV (2009)

    Google Scholar 

  24. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)

    Google Scholar 

  25. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660. IEEE (2014)

    Google Scholar 

  26. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 (voc 2007) results (2007) 11 (2008)

    Google Scholar 

  27. Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)

    Google Scholar 

  28. Berg, A.C., Malik, J.: Geometric blur for template matching. In: CVPR (2001)

    Google Scholar 

  29. Zhang, Y., Chen, T.: Implicit shape kernel for discriminative learning of the hough transform detector. In: BMVC (2010)

    Google Scholar 

Download references

Acknowledgement

This work is supported by National Natural Science Foundation of China (Project NO: 61503168).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanxi Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, H., He, X., Barnes, N., Wang, M. (2016). Learning Hough Transform with Latent Structures for Joint Object Detection and Pose Estimation. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9517. Springer, Cham. https://doi.org/10.1007/978-3-319-27674-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27674-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27673-1

  • Online ISBN: 978-3-319-27674-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics