Skip to main content

Sequential Feature Fusion for Object Detection

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

  • 3624 Accesses

Abstract

In an image, the category and the location of an object are related to global, spatial and contextual visual information of the object, which are all extremely important for accurate and efficient object detection. In this paper, we propose a region-based detector named Sequential Feature Fusion Network (SFFN) which simultaneously utilizes global, spatial and multi-scale contextual Region-of-Interest (RoI) features of an object and fuses them by a novel method. Specifically, we design a Feature Fusion Block (FFB) to fuse global and multi-scale contextual RoI features, which are extracted by RoI pooling layer. Then we apply the concatenation operation to integrate the fused feature with spatial RoI feature extracted by Positive-Sensitive RoI (PSRoI) pooling layer. The experimental results show that the performance of SFFN obtains significant improvements on both the PASCAL VOC 2007 and VOC 2012 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  2. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)

    Google Scholar 

  3. Girshick, R.: Fast R-CNN. In: CVPR, pp. 1440–1448 (2015)

    Google Scholar 

  4. Shrivastava, A., Gupta, A., Girshick, A.: Training regionbased object detectors with online hard example mining. In: CVPR, pp. 761–169 (2016)

    Google Scholar 

  5. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  6. Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-Outside net: detecting objects in context with skip pooling and Recurrent Neural Networks. In: CVPR, pp. 2874–2883 (2016)

    Google Scholar 

  7. Kong, T., Yao, A., Chen, Y., Sun, F.: Hypernet: towards accurate region proposal generation and joint object detection. In: CVPR, pp. 845–853 (2016)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun., J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  9. Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242 (2016)

  10. Huang, G., Liu, Z., Weinberger, K., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  11. Dai, J., et al.: Deformable convolutional networks. In: ICCV, pp. 764–773 (2017)

    Google Scholar 

  12. Li, Y., He, K., Sun, J.: R-FCN: object detection via regionbased fully convolutional networks. In: NIPS, pp. 379–387 (2016)

    Google Scholar 

  13. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  14. Shen, Z., et al.: DSOD: learning deeply supervised object detectors from scratch. In: ICCV, pp. 1937–1945 (2017)

    Google Scholar 

  15. Zhu, Y., et al.: CoupleNet: coupling global structure with local parts for object detection. In: ICCV (2017)

    Google Scholar 

  16. Kim, K.-H., Hong, S., Roh, B., Cheon, Y., Park, M.: Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016)

  17. Fan, X., Guo, H., Zheng, K., Feng, W., Wang, S.: Object Detection with Mask-based Feature Encoding. arXiv preprint arXiv:1802.03934 (2018)

  18. Xu, Y., Han, Y., Tian, R.H.Q.: Sequential video VLAD: training the aggregation locally and temporally. IEEE Trans. Image Process. (IEEE TIP) 27(10), 4933–4944 (2018)

    Article  Google Scholar 

  19. Yang, Z., Han, Y., Wang, Z.: Catching the temporal regions-of-interest for video captioning. In: ACM MM, pp. 146–153 (2017)

    Google Scholar 

  20. Zeng, X., Ouyang, W., Yang, B., Yan, J., Wang, X.: Gated Bi-directional CNN for object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 354–369. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_22

    Chapter  Google Scholar 

  21. Li, J., et al.: Attentive contexts for object detection. IEEE Trans. Multimed. 19(5), 944–954 (2017)

    Article  Google Scholar 

  22. Zhu, L., et al.: Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans. Multimed. 19(9), 2066–2079 (2017)

    Article  Google Scholar 

  23. Xie, L., Shen, J., Han, J., Zhu, L., Shao, L.: Dynamic multi-view hashing for online image retrieval. In: IJCAI, pp. 3133–3139 (2017)

    Google Scholar 

  24. Zhu, L., Huang, Z., Li, Z., Xie, L., Shen, H.T.: Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–13 (2018). https://doi.org/10.1109/TNNLS.2018.2797248

    Article  Google Scholar 

  25. Liu, J., et al.: Multi-scale triplet CNN for person re-identification. In: ACM MM, pp. 192–196 (2016)

    Google Scholar 

  26. Liu, D., Zha, Z.J., Zhang, H., Zhang, Y., Wu, F.: Context-aware visual policy network for sequence-level image captioning. In: ACM MM (2018)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the NSFC (under Grant U1509206, 61472276) and Tianjin Natural Science Foundation (no. 15JCYBJC15400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yahong Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Q., Han, Y. (2018). Sequential Feature Fusion for Object Detection. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics