Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection

Zhu, Yousong; Wang, Jinqiao; Zhao, Chaoyang; Guo, Haiyun; Lu, Hanqing

doi:10.1007/978-3-319-54184-6_26

Yousong Zhu^17,18,
Jinqiao Wang^17,18,
Chaoyang Zhao^17,18,
Haiyun Guo^17,18 &
…
Hanqing Lu^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Asian Conference on Computer Vision

2206 Accesses
6 Citations

Abstract

Although the Region-based Convolutional Neural Network (R-CNN) families have shown promising results for object detection, they still face great challenges for task-specific detection, e.g., pedestrian detection, the current difficulties of which mainly lie in the large scale variations of pedestrians and insufficient discriminative power of pedestrian features. To overcome these difficulties, we propose a novel Scale-Adaptive Deconvolutional Regression (SADR) network in this paper. Specifically, the proposed network can effectively detect pedestrians of various scales by flexibly choosing which feature layer to regress object locations according to the height of pedestrians, thus improving the detection accuracy significantly. Furthermore, considering CNN can abstract different semantic-level features from different layers, we fuse features from multiple layers to provide both local characteristics and global semantic information of the object for final pedestrian classification, which improves the discriminative power of pedestrian features and boosts the detection performance further. Extensive experiments have verified the effectiveness of our proposed approach, which achieves the state-of-the-art log-average miss rate (MR) of 6.94% on the revised Caltech [1] and a competitive result on KITTI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR (2016)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features (2009)
Google Scholar
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014)
Article Google Scholar
Zhang, S., Bauckhage, C., Cremers, A.: Informed haar-like features improve pedestrian detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 947–954 (2014)
Google Scholar
Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1751–1760. IEEE (2015)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)
Google Scholar
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3626–3633 (2013)
Google Scholar
Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3258–3265. IEEE (2012)
Google Scholar
Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2056–2063 (2013)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Badrinarayanan, V., Handa, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling (2015). arXiv preprint arXiv:1505.07293
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks (2015)
Google Scholar
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). arXiv preprint arXiv:1512.04143
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)
Google Scholar
Zagoruyko, S., Lerer, A., Lin, T.Y., Pinheiro, P.O., Gross, S., Chintala, S., Dollár, P.: A multipath network for object detection (2016). arXiv preprint arXiv:1604.02135
Cai, Z., Saberian, M., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3361–3369 (2015)
Google Scholar
Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Article Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26
Google Scholar
Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)
Google Scholar
Hosang, J., Benenson, R., Omran, M., Schiele, B.: Taking a deeper look at pedestrians. In: CVPR (2015)
Google Scholar
Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5087 (2015)
Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 743–761 (2012)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)
Google Scholar
Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection (2015). arXiv preprint arXiv:1510.08160
Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv preprint arXiv:1512.03385
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Google Scholar
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: Proceedings of IEEE International Conference on Computer Vision, pp. 82–90 (2015)
Google Scholar
Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2014)
Google Scholar
Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16181-5_47
Google Scholar
Paisitkriangkrai, S., Shen, C., Hengel, A.: Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 546–561. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_36
Google Scholar
Chen, X., Kundu, K., Zhu, Y., Berneshawi, A., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: NIPS (2015)
Google Scholar
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR (2016)
Google Scholar

Download references

Acknowledgment

This work was supported by 863 Program 2014AA015104, and National Science Foundation of China 61273034, and 61332016.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yousong Zhu, Jinqiao Wang, Chaoyang Zhao, Haiyun Guo & Hanqing Lu
University of Chinese Academy of Sciences, Beijing, China
Yousong Zhu, Jinqiao Wang, Chaoyang Zhao, Haiyun Guo & Hanqing Lu

Authors

Yousong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haiyun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yousong Zhu .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Wang, J., Zhao, C., Guo, H., Lu, H. (2017). Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-54184-6_26
Published: 10 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54183-9
Online ISBN: 978-3-319-54184-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics