Sequential Feature Fusion for Object Detection

Wang, Qiang; Han, Yahong

doi:10.1007/978-3-030-00776-8_63

Qiang Wang¹⁸ &
Yahong Han¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3624 Accesses

Abstract

In an image, the category and the location of an object are related to global, spatial and contextual visual information of the object, which are all extremely important for accurate and efficient object detection. In this paper, we propose a region-based detector named Sequential Feature Fusion Network (SFFN) which simultaneously utilizes global, spatial and multi-scale contextual Region-of-Interest (RoI) features of an object and fuses them by a novel method. Specifically, we design a Feature Fusion Block (FFB) to fuse global and multi-scale contextual RoI features, which are extracted by RoI pooling layer. Then we apply the concatenation operation to integrate the fused feature with spatial RoI feature extracted by Positive-Sensitive RoI (PSRoI) pooling layer. The experimental results show that the performance of SFFN obtains significant improvements on both the PASCAL VOC 2007 and VOC 2012 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: CVPR, pp. 1440–1448 (2015)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, A.: Training regionbased object detectors with online hard example mining. In: CVPR, pp. 761–169 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-Outside net: detecting objects in context with skip pooling and Recurrent Neural Networks. In: CVPR, pp. 2874–2883 (2016)
Google Scholar
Kong, T., Yao, A., Chen, Y., Sun, F.: Hypernet: towards accurate region proposal generation and joint object detection. In: CVPR, pp. 845–853 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun., J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242 (2016)
Huang, G., Liu, Z., Weinberger, K., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: ICCV, pp. 764–773 (2017)
Google Scholar
Li, Y., He, K., Sun, J.: R-FCN: object detection via regionbased fully convolutional networks. In: NIPS, pp. 379–387 (2016)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Shen, Z., et al.: DSOD: learning deeply supervised object detectors from scratch. In: ICCV, pp. 1937–1945 (2017)
Google Scholar
Zhu, Y., et al.: CoupleNet: coupling global structure with local parts for object detection. In: ICCV (2017)
Google Scholar
Kim, K.-H., Hong, S., Roh, B., Cheon, Y., Park, M.: Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016)
Fan, X., Guo, H., Zheng, K., Feng, W., Wang, S.: Object Detection with Mask-based Feature Encoding. arXiv preprint arXiv:1802.03934 (2018)
Xu, Y., Han, Y., Tian, R.H.Q.: Sequential video VLAD: training the aggregation locally and temporally. IEEE Trans. Image Process. (IEEE TIP) 27(10), 4933–4944 (2018)
Article Google Scholar
Yang, Z., Han, Y., Wang, Z.: Catching the temporal regions-of-interest for video captioning. In: ACM MM, pp. 146–153 (2017)
Google Scholar
Zeng, X., Ouyang, W., Yang, B., Yan, J., Wang, X.: Gated Bi-directional CNN for object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 354–369. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_22
Chapter Google Scholar
Li, J., et al.: Attentive contexts for object detection. IEEE Trans. Multimed. 19(5), 944–954 (2017)
Article Google Scholar
Zhu, L., et al.: Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans. Multimed. 19(9), 2066–2079 (2017)
Article Google Scholar
Xie, L., Shen, J., Han, J., Zhu, L., Shao, L.: Dynamic multi-view hashing for online image retrieval. In: IJCAI, pp. 3133–3139 (2017)
Google Scholar
Zhu, L., Huang, Z., Li, Z., Xie, L., Shen, H.T.: Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–13 (2018). https://doi.org/10.1109/TNNLS.2018.2797248
Article Google Scholar
Liu, J., et al.: Multi-scale triplet CNN for person re-identification. In: ACM MM, pp. 192–196 (2016)
Google Scholar
Liu, D., Zha, Z.J., Zhang, H., Zhang, Y., Wu, F.: Context-aware visual policy network for sequence-level image captioning. In: ACM MM (2018)
Google Scholar

Download references

Acknowledgments

This work is supported by the NSFC (under Grant U1509206, 61472276) and Tianjin Natural Science Foundation (no. 15JCYBJC15400).

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, Tianjin, China
Qiang Wang & Yahong Han

Authors

Qiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yahong Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yahong Han .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q., Han, Y. (2018). Sequential Feature Fusion for Object Detection. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_63

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_63
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics