Skip to main content

Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Abstract

Although the Region-based Convolutional Neural Network (R-CNN) families have shown promising results for object detection, they still face great challenges for task-specific detection, e.g., pedestrian detection, the current difficulties of which mainly lie in the large scale variations of pedestrians and insufficient discriminative power of pedestrian features. To overcome these difficulties, we propose a novel Scale-Adaptive Deconvolutional Regression (SADR) network in this paper. Specifically, the proposed network can effectively detect pedestrians of various scales by flexibly choosing which feature layer to regress object locations according to the height of pedestrians, thus improving the detection accuracy significantly. Furthermore, considering CNN can abstract different semantic-level features from different layers, we fuse features from multiple layers to provide both local characteristics and global semantic information of the object for final pedestrian classification, which improves the discriminative power of pedestrian features and boosts the detection performance further. Extensive experiments have verified the effectiveness of our proposed approach, which achieves the state-of-the-art log-average miss rate (MR) of 6.94% on the revised Caltech [1] and a competitive result on KITTI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR (2016)

    Google Scholar 

  2. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  3. Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features (2009)

    Google Scholar 

  4. Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014)

    Article  Google Scholar 

  5. Zhang, S., Bauckhage, C., Cremers, A.: Informed haar-like features improve pedestrian detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 947–954 (2014)

    Google Scholar 

  6. Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1751–1760. IEEE (2015)

    Google Scholar 

  7. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  8. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)

    Article  Google Scholar 

  9. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)

    Google Scholar 

  10. Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3626–3633 (2013)

    Google Scholar 

  11. Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3258–3265. IEEE (2012)

    Google Scholar 

  12. Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2056–2063 (2013)

    Google Scholar 

  13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)

    Article  Google Scholar 

  15. Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  16. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  17. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

    Google Scholar 

  18. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  19. Badrinarayanan, V., Handa, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling (2015). arXiv preprint arXiv:1505.07293

  20. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks (2015)

    Google Scholar 

  21. Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). arXiv preprint arXiv:1512.04143

  22. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)

    Google Scholar 

  23. Zagoruyko, S., Lerer, A., Lin, T.Y., Pinheiro, P.O., Gross, S., Chintala, S., Dollár, P.: A multipath network for object detection (2016). arXiv preprint arXiv:1604.02135

  24. Cai, Z., Saberian, M., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3361–3369 (2015)

    Google Scholar 

  25. Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)

    Article  Google Scholar 

  26. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26

    Google Scholar 

  27. Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)

    Google Scholar 

  28. Hosang, J., Benenson, R., Omran, M., Schiele, B.: Taking a deeper look at pedestrians. In: CVPR (2015)

    Google Scholar 

  29. Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5087 (2015)

    Google Scholar 

  30. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 743–761 (2012)

    Article  Google Scholar 

  31. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)

    Google Scholar 

  32. Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection (2015). arXiv preprint arXiv:1510.08160

  33. Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  34. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

  35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv preprint arXiv:1512.03385

  36. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)

    Google Scholar 

  37. Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: Proceedings of IEEE International Conference on Computer Vision, pp. 82–90 (2015)

    Google Scholar 

  38. Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2014)

    Google Scholar 

  39. Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16181-5_47

    Google Scholar 

  40. Paisitkriangkrai, S., Shen, C., Hengel, A.: Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 546–561. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_36

    Google Scholar 

  41. Chen, X., Kundu, K., Zhu, Y., Berneshawi, A., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: NIPS (2015)

    Google Scholar 

  42. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR (2016)

    Google Scholar 

Download references

Acknowledgment

This work was supported by 863 Program 2014AA015104, and National Science Foundation of China 61273034, and 61332016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousong Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhu, Y., Wang, J., Zhao, C., Guo, H., Lu, H. (2017). Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54184-6_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54183-9

  • Online ISBN: 978-3-319-54184-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics