Spatial-temporal Fusion Network with Residual Learning and Attention Mechanism: A Benchmark for Video-Based Group Re-ID

Xu, Qiling; Yang, Hua; Chen, Lin

doi:10.1007/978-3-030-31654-9_42

Qiling Xu¹⁶,
Hua Yang¹⁶ &
Lin Chen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11857))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2654 Accesses

Abstract

Video-based group re-identification (Re-ID) remains to be a meaningful task under rare study. Group Re-ID contains the information of the relationship between pedestrians, while the video sequences provide more frames to identify the person. In this paper, we propose a spatial-temporal fusion network for the group Re-ID. The network composes of the residual learning played between the CNN and the RNN in a unified network, and the attention mechanism which makes the system focus on the discriminative features. We also propose a new group Re-ID dataset DukeGroupVid to evaluate the performance of our spatial-temporal fusion network. Comprehensive experimental results on the proposed dataset and other video-based datasets, PRID-2011, i-LIDS-VID and MARS, demonstrate the effectiveness of our model.

The first author is a student.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xiao, H., et al.: Group re-identification: leveraging and integrating multi-grain information. In: 2018 ACM Multimedia Conference on Multimedia Conference, ACM (2018
Google Scholar
Lisanti, G., et al.: Group re-identification via unsupervised transfer of sparse features encoding. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Chapter Google Scholar
Xu, S., et al.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_52
Chapter Google Scholar
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 701–716. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_42
Chapter Google Scholar
Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 475–491. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_30
Chapter Google Scholar
Wu, L., Shen, C., van den Hengel, A.: Deep recurrent convolutional networks for video-based person re-identification: an end-to-end approach. arXiv preprint arXiv:1606.01609 (2016)
McLaughlin, N., del Rincon, J.M., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Chen, L., et al.: Deep spatial-temporal fusion network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2017)
Google Scholar
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_2
Chapter Google Scholar
Huang, G., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Gao, J., Nevatia, R.: Revisiting temporal modeling for video-based person ReID. arXiv preprint arXiv:1805.02104 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21227-7_9
Chapter Google Scholar
Wei-Shi, Z., Shaogang, G., Tao, X.: Associating groups of people. In: Proceedings of the British Machine Vision Conference (2009)
Google Scholar
Liu, Y., et al.: Spatial and temporal mutual promotion for video-based person re-identification. arXiv preprint arXiv:1812.10305 (2018)
Song, G., et al.: Region-based quality estimation network for large-scale person re-identification. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Li, S., et al.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar

Download references

Acknowledgement

This work was supported in part by National Natural Science Foundation of China (NSFC, Grant No. 61771303 and 61671289), Science and Technology Commission of Shanghai Municipality (STCSM, Grant Nos. 17DZ1205602, 18DZ1200- 102, 18DZ2270700), and SJTUYitu/Thinkforce Joint laboratory for visual computing and application. Funded by National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data (PSRPC).

Author information

Authors and Affiliations

Institution of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
Qiling Xu, Hua Yang & Lin Chen

Authors

Qiling Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Yang .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University, Nanjing, Jiangsu, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, Shaanxi, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Q., Yang, H., Chen, L. (2019). Spatial-temporal Fusion Network with Residual Learning and Attention Mechanism: A Benchmark for Video-Based Group Re-ID. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11857. Springer, Cham. https://doi.org/10.1007/978-3-030-31654-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-31654-9_42
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31653-2
Online ISBN: 978-3-030-31654-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics