Norm-Aware Embedding for Efficient Person Search and Tracking

Chen, Di; Zhang, Shanshan; Yang, Jian; Schiele, Bernt

doi:10.1007/s11263-021-01512-5

Norm-Aware Embedding for Efficient Person Search and Tracking

Published: 14 September 2021

Volume 129, pages 3154–3168, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Di Chen^1,2,
Shanshan Zhang¹,
Jian Yang¹ &
…
Bernt Schiele²

1013 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Person detection and Re-identification are two well-defined support tasks for practically relevant tasks such as Person Search and Multiple Person Tracking. Person Search aims to find and locate all instances with the same identity as the query person in a set of panoramic gallery images. Similarly, Multiple Person Tracking, especially when using the tracking-by-detection pipeline, requires to detect and associate all appeared persons in consecutive video frames. One major challenge shared by the two tasks comes from the contradictory goals of detection and re-identification, i.e, person detection focuses on finding the commonness of all persons while person re-ID handles the differences among multiple identities. Therefore, it is crucial to reconcile the relationship between the two support tasks in a joint model. To this end, we present a novel approach called Norm-Aware Embedding to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training. We further extend the proposal-level person embedding to pixel-level, whose discrimination ability is less affected by misalignment. Our Norm-Aware Embedding achieves remarkable performance on both person search and multiple person tracking benchmarks, with the merit of being easy to train and resource-friendly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OIMNet++: Prototypical Normalization and Localization-Aware Learning for Person Search

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Efficient Person Search: An Anchor-Free Approach

Article 21 March 2023

Notes

https://github.com/DeanChan/NAE4PS.
Code will be updated at this site.
https://motchallenge.net/results/MOT17/.

References

Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. CVPR. https://doi.org/10.1109/CVPR.2015.7299016
Article Google Scholar
Babaee, M., Athar, A., Rigoll, G. (2018) Multiple people tracking using hierarchical deep tracklet re-identification. arXiv preprint arXiv:1811.04091
Bergmann, P., Meinhardt, T., Leal-Taixe, L. (2019). Tracking without bells and whistles. In: ICCV
Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008, 1–10.
Article Google Scholar
Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L. (2009). Robust tracking-by-detection using a detector confidence particle filter. In: ICCV
Chang, X., Huang, P.Y., Shen, Y.D., Liang, X., Yang, Y., Hauptmann, A.G. (2018). Rcaa: Relational context-aware agents for person search. In: ECCV
Chen, D., Zhang, S., Ouyang, W., Yang, J., Schiele, B. (2020). Hierarchical online instance matching for person search. In: AAAI
Chen, D., Zhang, S., Ouyang, W., Yang, J., Tai, Y. (2018). Person search via a mask-guided two-stream cnn model. In: ECCV
Chen, D., Zhang, S., Ouyang, W., Yang, J., & Tai, Y. (2020). Person search by separated modeling and a mask-guided two-stream cnn model. TIP, 29, 4669–4682.
Google Scholar
Chen, D., Zhang, S., Yang, J., Schiele, B. (2020). Norm-aware embedding for efficient person search. In: CVPR
Cheng, D., Gong, Y., Zhou, S., Wang, J., & Zheng, N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. CVPR. https://doi.org/10.1109/CVPR.2016.149
Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV
Chu, P., Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: ICCV
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR
Deng, J., Guo, J., Xue, N., Zafeiriou, S. (2018) Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698
Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. PR, 48(10), 2993–3003. https://doi.org/10.1016/j.patcog.2015.04.005
Article Google Scholar
Dollar, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. TPAMI, 36(8), 1532–1545. https://doi.org/10.1109/TPAMI.2014.2300479
Article Google Scholar
Dollar, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In: BMVC. https://doi.org/10.5244/C.23.91
Evangelidis, G. D., & Psarakis, E. Z. (2008). Parametric image alignment using enhanced correlation coefficient maximization. TPAMI, 30(10), 1858–1865.
Article Google Scholar
Fan, X., Jiang, W., Luo, H., Fei, M. (2018). Spherereid: Deep hypersphere manifold embedding for person re-identification. arXiv preprint arXiv:1807.00537
Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. CVPR. https://doi.org/10.1109/CVPR.2010.5539926
Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. TPAMI, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Article Google Scholar
Feng, W., Hu, Z., Wu, W., Yan, J., Ouyang, W. (2019). Multi-object tracking with multiple cues and switcher-aware classification. arXiv preprint arXiv:1901.06129
Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Girshick, R., Iandola, F., Darrell, T., Malik, J. (2015). Deformable part models are convolutional neural networks. In: CVPR
Guo, Y., Zhang, L. (2017). One-shot face recognition by promoting underrepresented classes. arXiv preprint arXiv:1707.05574
Han, C., Ye, J., Zhong, Y., Tan, X., Zhang, C., Gao, C., Sang, N. (2019). Re-id driven localization refinement for person search. In: ICCV
He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In: ICCV
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: CVPR
Henschel, R., Zou, Y., Rosenhahn, B. (2019). Multiple people tracking using body and joint detections. In: CVPRW
Ioffe, S., Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML
Keuper, M., Tang, S., Zhongjie, Y., Andres, B., Brox, T., Schiele, B. (2016). A multi-cut formulation for joint segmentation and tracking of multiple objects. arXiv preprint arXiv:1607.06317
Kim, C., Li, F., Ciptadi, A., Rehg, J.M. (2015). Multiple hypothesis tracking revisited. In: ICCV
Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2012). Large scale metric learning from equivalence constraints. CVPR. https://doi.org/10.1109/CVPR.2012.6247939
Kuo, C.H., Nevatia, R. (2011). How does person identity recognition help multi-person tracking? In: CVPR
Lan, X., Zhu, X., Gong, S. (2018). Person search by multi-scale matching. In: ECCV
Leal-Taixé, L., Canton-Ferrer, C., Schindler, K. (2016). Learning by tracking: Siamese cnn for robust target association. In: CVPRW
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). DeepReID: Deep filter pairing neural network for person re-identification. CVPR. https://doi.org/10.1109/CVPR.2014.27
Li, X., Zheng, W. S., Wang, X., Xiang, T., & Gong, S. (2015). Multi-scale learning for low-resolution person re-identification. ICCV. https://doi.org/10.1109/ICCV.2015.429
Liao, S., Hu, Y., Zhu, X., Li, S.Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In: CVPR
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2017). Feature pyramid networks for object detection. In: CVPR
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: ECCV
Liu, H., Feng, J., Jie, Z., Jayashree, K., Zhao, B., Qi, M., Jiang, J., Yan, S. (2017). Neural person search machines. In: ICCV
Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7), 3492–3506. https://doi.org/10.1109/TIP.2017.2700762
Article MathSciNet MATH Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In: CVPR
Liu, W., Wen, Y., Yu, Z., Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In: ICML
Lu, Z., Rathod, V., Votel, R., Huang, J. (2020). Retinatrack: Online single stage joint detection and tracking. arXiv preprint arXiv:2003.13870
Ma, L., Tang, S., Black, M.J., Van Gool, L. (2018). Customized multi-person tracker. In: ACCV
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831
Milan, A., Roth, S., & Schindler, K. (2013). Continuous energy minimization for multitarget tracking. TPAMI, 36(1), 58–72.
Article Google Scholar
Munjal, B., Amin, S., Tombari, F., Galasso, F. (2019). Query-guided end-to-end person search. In: CVPR
Ouyang, W., Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR
Ouyang, W., Wang, X. (2013). Joint deep learning for pedestrian detection. In: ICCV
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A. (2017). Automatic differentiation in pytorch. In: NIPS-W
Pirsiavash, H., Ramanan, D., Fowlkes, C.C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B. (2016). Deepcut: Joint subset partition and labeling for multi person pose estimation. In: CVPR
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. TPAMI, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV
Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015). Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041
Tang, S., Andriluka, M., Andres, B., Schiele, B. (2017). Multiple people tracking by lifted multicut and person re-identification. In: CVPR
Tian, Z., Shen, C., Chen, H., He, T. (2019). Fcos: Fully convolutional one-stage object detection. In: ICCV
Varior, R. R., Shuai, B., Lu, J., Xu, D., & Wang, G. (2016). A siamese long short-term memory architecture for human re-identification. ECCV
Wang, X., Doretto, G., Sebastian, T., Rittscher, J., & Tu, P. (2007). Shape and appearance context modeling. ICCV. https://doi.org/10.1109/ICCV.2007.4409019
Wang, Y., Gong, D., Zhou, Z., Ji, X., Wang, H., Li, Z., Liu, W., Zhang, T. (2018). Orthogonal deep features decomposition for age-invariant face recognition. In: ECCV
Wang, Z., Zheng, L., Liu, Y., Wang, S (2019)Towards real-time multiobject tracking. arXiv preprint arXiv:1909.12605
Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: Glad: Global-local-alignment descriptor for pedestrian retrieval. In: ACM’MM (2017)
Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: CVPR (2014)
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. ECCV
Xiang, J., Xu, G., Ma, C., Hou, J. (2020). End-to-end learning deep crf models for multi-object tracking. TCSVT
Xiang, W., Huang, J., Qi, X., Hua, X.S., Zhang, L. (2018). Homocentric hypersphere feature embedding for person re-identification. arXiv preprint arXiv:1804.08866
Xiang, Y., Alahi, A., Savarese, S.: Learning to track: Online multi-object tracking by decision making. In: ICCV (2015)
Xiao, J., Xie, Y., Tillo, T., Huang, K., Wei, Y., Feng, J. (2017). Ian: The individual aggregation network for person search. arXiv preprint arXiv:1705.05552
Xiao, T., Li, H., Ouyang, W., Wang, X. (2016). Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X. (2017). Joint detection and identification feature learning for person search. In: CVPR
Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W. (2018). Attention-aware compositional network for person re-identification. In: CVPR
Xu, Y., Osep, A., Ban, Y., Horaud, R., Leal-Taixé, L., Alameda-Pineda, X. (2020). How to train your deep multi-object tracker. In: CVPR
Yan, Y., Li, J., Qin, J., Bai, S., Liao, S., Liu, L., Zhu, F., Shao, L. (2021). Anchor-free person search. In: CVPR
Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. TPAMI. https://doi.org/10.1109/TPAMI.2020.3032542
Yan, Y., Zhang, Q., Ni, B., Zhang, W., Xu, M., Yang, X.: Learning context graph for person search. In: CVPR (2019)
Yang, F., Choi, W., Lin, Y.: Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: CVPR (2016)
Yao, H., Zhang, S., Hong, R., Zhang, Y., Xu, C., & Tian, Q. (2019). Deep representation learning with part loss for person re-identification. TIP, 28(6), 2860–2871.
MathSciNet MATH Google Scholar
Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Deep metric learning for person re-identification. ICPR. https://doi.org/10.1109/ICPR.2014.16
Zhang, L., Xiang, T., Gong, S. (2016). Learning a discriminative null space for person re-identification. In: CVPR
Zhang, S., Bauckhage, C., Cremers, A.B. (2014) Informed haar-like features improve pedestrian detection. In: CVPR
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B. (2016).How far are we from solving pedestrian detection? In: CVPR
Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018). Towards reaching human performance in pedestrian detection. TPAMI, 40(4), 973–986. https://doi.org/10.1109/TPAMI.2017.2700460
Article Google Scholar
Zhang, S., Benenson, R., Schiele, B. (2015). Filtered channel features for pedestrian detection. In: CVPR
Zhang, S., Benenson, R., Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In: CVPR
Zhang, S., Yang, J., Schiele, B. (2018). Occluded pedestrian detection through guided attention in cnns. In: CVPR
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020)
Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)
Zhao, R., Ouyang, W., & Wang, X. (2013). Unsupervised salience learning for person re-identification. CVPR. https://doi.org/10.1109/CVPR.2013.460
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). Mars: A video benchmark for large-scale person re-identification. ECCV
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q. (2015). Scalable person re-identification: A benchmark. In: ICCV
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q. (2017). Person re-identification in the wild. In: CVPR
Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850

Download references

Acknowledgements

This work was partially supported by the National Science Fund of China (Grant No. U1713208), Funds for International Co-operation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), “111” Program B13022, Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), and National Key Research and Development Program of China (Grant No. 2017YFC0820601).

Author information

Authors and Affiliations

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Di Chen, Shanshan Zhang & Jian Yang
Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrucken, Germany
Di Chen & Bernt Schiele

Authors

Di Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shanshan Zhang or Jian Yang.

Additional information

Communicated by Ivan Laptev.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, D., Zhang, S., Yang, J. et al. Norm-Aware Embedding for Efficient Person Search and Tracking. Int J Comput Vis 129, 3154–3168 (2021). https://doi.org/10.1007/s11263-021-01512-5

Download citation

Received: 29 September 2020
Accepted: 30 July 2021
Published: 14 September 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11263-021-01512-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Norm-Aware Embedding for Efficient Person Search and Tracking

Abstract

Access this article

Similar content being viewed by others

OIMNet++: Prototypical Normalization and Localization-Aware Learning for Person Search

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Efficient Person Search: An Anchor-Free Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Norm-Aware Embedding for Efficient Person Search and Tracking

Abstract

Access this article

Similar content being viewed by others

OIMNet++: Prototypical Normalization and Localization-Aware Learning for Person Search

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Efficient Person Search: An Anchor-Free Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation