Skip to main content

MessyTable: Instance Association in Multiple Camera Views

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12356))

Included in the following conference series:

Abstract

We present an interesting and challenging dataset that features a large number of scenes with messy tables captured from multiple camera views. Each scene in this dataset is highly complex, containing multiple object instances that could be identical, stacked and occluded by other instances. The key challenge is to associate all instances given the RGB image of all views. The seemingly simple task surprisingly fails many popular methods or heuristics that we assume good performance in object association. The dataset challenges existing methods in mining subtle appearance differences, reasoning based on contexts, and fusing appearance with geometric cues for establishing an association. We report interesting findings with some popular baselines, and discuss how this dataset could help inspire new problems and catalyse more robust formulations to tackle real-world instance association problems. (Project page: https://caizhongang.github.io/projects/MessyTable/.)

Z. Cai and J. Zhang—Indicates equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baqué, P., Fleuret, F., Fua, P.: Deep occlusion reasoning for multi-camera multi-target detection. In: ICCV (2017)

    Google Scholar 

  2. Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools 25, 120–125 (2000)

    Google Scholar 

  3. Caliskan, A., Mustafa, A., Imre, E., Hilton, A.: Learning dense wide baseline stereo matching for people. In: ICCVW (2019)

    Google Scholar 

  4. Chavdarova, T., et al.: WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection. In: CVPR (2018)

    Google Scholar 

  5. Chavdarova, T., et al.: Deep multi-camera people detection. In: ICMLA (2017)

    Google Scholar 

  6. Csurka, G., Humenberger, M.: From handcrafted to deep local features for computer vision applications. CoRR abs/1807.10254 (2018)

    Google Scholar 

  7. Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. PAMI 30, 267–282 (2007)

    Article  Google Scholar 

  8. Gao, J., Nevatia, R.: Revisiting temporal modeling for video-based person ReID. CoRR abs/1805.02104 (2018)

    Google Scholar 

  9. Raja, Y., Gong, S.: Scalable multi-camera tracking in a metropolis. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification. ACVPR, pp. 413–438. Springer, London (2014). https://doi.org/10.1007/978-1-4471-6296-4_20

    Chapter  Google Scholar 

  10. Gou, M., et al.: A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. PAMI (2018)

    Google Scholar 

  11. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: CVPR (2015)

    Google Scholar 

  12. Hsu, H.M., Huang, T.W., Wang, G., Cai, J., Lei, Z., Hwang, J.N.: Multi-camera tracking of vehicles based on deep features Re-ID and trajectory-based camera link models. In: CVPRW (2019)

    Google Scholar 

  13. Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: CVPR (2014)

    Google Scholar 

  14. Li, W., Mu, J., Liu, G.: Multiple object tracking with motion and appearance cues. In: ICCVW (2019)

    Google Scholar 

  15. López-Cifuentes, A., Escudero-Viñolo, M., Bescós, J., Carballeira, P.: Semantic driven multi-camera pedestrian detection. CoRR abs/1812.10779 (2018)

    Google Scholar 

  16. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  17. Luo, W., et al.: Multiple object tracking: a literature review. CoRR abs/1409.7618 (2014)

    Google Scholar 

  18. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. CoRR abs/1603.00831 (2016)

    Google Scholar 

  19. Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI (2017)

    Google Scholar 

  20. Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_2

    Chapter  Google Scholar 

  21. Roig, G., Boix, X., Shitrit, H.B., Fua, P.: Conditional random fields for multi-camera object detection. In: ICCV (2011)

    Google Scholar 

  22. Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: ICCV (2017)

    Google Scholar 

  23. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)

    Google Scholar 

  24. Schulter, S., Vernaza, P., Choi, W., Chandraker, M.: Deep network flow for multi-object tracking. In: CVPR (2017)

    Google Scholar 

  25. Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: ICCV (2015)

    Google Scholar 

  26. Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30

    Chapter  Google Scholar 

  27. Susanto, W., Rohrbach, M., Schiele, B.: 3D object detection with multiple kinects. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 93–102. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33868-7_10

    Chapter  Google Scholar 

  28. Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer GAN to bridge domain gap for person re-identification. In: CVPR (2018)

    Google Scholar 

  29. Wei, X.S., Cui, Q., Yang, L., Wang, P., Liu, L.: RPC: a large-scale retail product checkout dataset. CoRR abs/1901.07249 (2019)

    Google Scholar 

  30. Winder, S., Hua, G., Brown, M.: Picking the best DAISY. In: CVPR (2009)

    Google Scholar 

  31. Xu, Y., Zhou, X., Chen, S., Li, F.: Deep learning for multiple object tracking: a survey. IET Comput. Vis. 13, 355–368 (2019)

    Article  Google Scholar 

  32. Xu, Y., Liu, X., Liu, Y., Zhu, S.C.: Multi-view people tracking via hierarchical trajectory composition. In: CVPR (2016)

    Google Scholar 

  33. Xu, Y., Liu, X., Qin, L., Zhu, S.C.: Cross-view people tracking by scene-centered spatio-temporal parsing. In: AAAI (2017)

    Google Scholar 

  34. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. CoRR abs/2001.04193 (2020)

    Google Scholar 

  35. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: CVPR (2015)

    Google Scholar 

  36. Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: CVPR (2015)

    Google Scholar 

  37. Zhang, Z., Wu, J., Zhang, X., Zhang, C.: Multi-target, multi-camera tracking by hierarchical clustering: recent progress on DukeMTMC project. CoRR abs/1712.09531 (2017)

    Google Scholar 

  38. Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)

    Google Scholar 

  39. Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: BMVC (2009)

    Google Scholar 

  40. Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: CVPR (2018)

    Google Scholar 

Download references

Acknowledgements

This research was supported by SenseTime-NTU Collaboration Project, Singapore MOE AcRF Tier 1 (2018-T1-002-056), NTU SUG, and NTU NAP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Change Loy .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11656 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, Z. et al. (2020). MessyTable: Instance Association in Multiple Camera Views. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12356. Springer, Cham. https://doi.org/10.1007/978-3-030-58621-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58621-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58620-1

  • Online ISBN: 978-3-030-58621-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics