Skip to main content

Accurate Object Detection with Location Relaxation and Regionlets Re-localization

  • Conference paper
  • First Online:
Computer Vision – ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Included in the following conference series:

Abstract

Standard sliding window based object detection requires dense classifier evaluation on densely sampled locations in scale space in order to achieve an accurate localization. To avoid such dense evaluation, selective search based algorithms only evaluate the classifier on a small subset of object proposals. Notwithstanding the demonstrated success, object proposals do not guarantee perfect overlap with the object, leading to a suboptimal detection accuracy. To address this issue, we propose to first relax the dense sampling of the scale space with coarse object proposals generated from bottom-up segmentations. Based on detection results on these proposals, we then conduct a top-down search to more precisely localize the object using supervised descent. This two-stage detection strategy, dubbed location relaxation, is able to localize the object in the continuous parameter space. Furthermore, there is a conflict between accurate object detection and robust object detection. That is because the achievement of the later requires the accommodation of inaccurate and perturbed object locations in the training phase. To address this conflict, we leverage the rich spatial information learned from the Regionlets detection framework to determine where the object is precisely localized. Our proposed approaches are extensively validated on the PASCAL VOC 2007 dataset and a self-collected large scale car dataset. Our method boosts the mean average precision of the current state-of-the-art (41.7 %) to 44.1 % on PASCAL VOC 2007 dataset. To our best knowledge, it is the best performance reported without using outside data (Convolutional neural network based approaches are commonly pre-trained on a large scale outside dataset and fine-tuned on the VOC dataset.).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We empirically found that using the four coordinates for our localization model produces better performance than using \((x,y,s,a)\). Thus we choose \((l,t, r,b)\) in our Regionlets Re-localization approach.

References

  1. Chen, G., Ding, Y., Xiao, J., Han, T.X.: Detection evolution with multi-order contextual co-occurrence. In: CVPR (2013)

    Google Scholar 

  2. Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR (2010)

    Google Scholar 

  3. Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: ICCV (2009)

    Google Scholar 

  4. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)

    Google Scholar 

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

    Google Scholar 

  6. Viola, P., Jones, M.: Robust real-time object detection. IJCV (2001)

    Google Scholar 

  7. Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: CVPR (2008)

    Google Scholar 

  8. Lampert, C.H.: An efficient divide-and-conquer cascade for nonlinear object detection. In: CVPR (2010)

    Google Scholar 

  9. Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: ICCV (2011)

    Google Scholar 

  11. Cinbis, R.G., Verbeek, J., Schmid, C.: Segmentation driven object detection with fisher vectors. In: ICCV (2013)

    Google Scholar 

  12. Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE T-PAMI 34, 2189–2202 (2012)

    Article  Google Scholar 

  13. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

    Google Scholar 

  14. Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)

    Google Scholar 

  15. Chang, K.Y., Liu, T.L., Chen, H.T., Lai, S.H.: Fusing generic objectness and visual saliency for salient object detection. In: ICCV (2011)

    Google Scholar 

  16. Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV (2013)

    Google Scholar 

  17. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. IJCV 59, 167–181 (2004)

    Article  Google Scholar 

  18. Fan, R., Chang, K., Hsieh, C., Wang, X., Jin, C.: Liblinear: a library for large linear classification. JMLR 9, 1871–1874 (2008)

    MATH  Google Scholar 

  19. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: bject detection with discriminatively trained part-based models. IEEE T-PAMI 32, 1627–1645 (2010)

    Article  Google Scholar 

  20. Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)

    Google Scholar 

  21. Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)

    Google Scholar 

  22. Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: CVPR (2011)

    Google Scholar 

  23. Li, C., Parikh, D., Chen, T.: Extracting adaptive contextual cues from unlabeled regions. In: ICCV (2011)

    Google Scholar 

  24. Cinbis, R.G., Sclaroff, S.: Contextual object detection using set-based classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 43–57. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  25. Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: superpixels extracted via energy-driven sampling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 13–26. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Acknowledgements

The main part of the work was carried out when the first author was a summer intern at NEC Laboratories America in Cupertino, CA. Research reported in this publication was also partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR015371. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work is also partly supported by US National Science Foundation Grant IIS 1350763, China National Natural Science Foundation Grant 61228303, GH’s start-up funds form Stevens Institute of Technology, a Google Research Faculty Award, a gift grant from Microsoft Research, and a gift grant from NEC Labs America.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyu Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Long, C., Wang, X., Hua, G., Yang, M., Lin, Y. (2015). Accurate Object Detection with Location Relaxation and Regionlets Re-localization. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16865-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16864-7

  • Online ISBN: 978-3-319-16865-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics