Skip to main content

Explainable Sparse Attention for Memory-Based Trajectory Predictors

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Abstract

In this paper we address the problem of trajectory prediction, focusing on memory-based models. Such methods are trained to collect a set of useful samples that can be retrieved and used at test time to condition predictions. We propose Explainable Sparse Attention (ESA), a module that can be seamlessly plugged-in into several existing memory-based state of the art predictors. ESA generates a sparse attention in memory, thus selecting a small subset of memory entries that are relevant for the observed trajectory. This enables an explanation of the model’s predictions with reference to previously observed training samples. Furthermore, we demonstrate significant improvements on three trajectory prediction datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–971 (2016)

    Google Scholar 

  2. Berlincioni, L., Becattini, F., Galteri, L., Seidenari, L., Bimbo, A.D.: Road layout understanding by generative adversarial inpainting. In: Escalera, S., Ayache, S., Wan, J., Madadi, M., Güçlü, U., Baró, X. (eds.) Inpainting and Denoising Challenges. TSSCML, pp. 111–128. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25614-2_10

    Chapter  Google Scholar 

  3. Berlincioni, L., Becattini, F., Seidenari, L., Del Bimbo, A.: Multiple future prediction leveraging synthetic trajectories (2020)

    Google Scholar 

  4. Bhattacharyya, A., Hanselmann, M., Fritz, M., Schiele, B., Straehle, C.N.: Conditional flow variational autoencoders for structured sequence prediction (2020)

    Google Scholar 

  5. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

    Google Scholar 

  6. Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8748–8757 (2019)

    Google Scholar 

  7. De Divitiis, L., Becattini, F., Baecchi, C., Bimbo, A.D.: Disentangling features for fashion recommendation. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) (2022)

    Google Scholar 

  8. De Divitiis, L., Becattini, F., Baecchi, C., Del Bimbo, A.: Style-based outfit recommendation. In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–4. IEEE (2021)

    Google Scholar 

  9. Dendorfer, P., Osep, A., Leal-Taixe, L.: Goal-GAN: multimodal trajectory prediction based on goal position estimation. In: Proceedings of the Asian Conference on Computer Vision (ACCV) (2020)

    Google Scholar 

  10. Deo, N., Trivedi, M.M.: Multi-modal trajectory prediction of surrounding vehicles with maneuver based LSTMS. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1179–1184. IEEE (2018)

    Google Scholar 

  11. Deo, N., Trivedi, M.M.: Trajectory forecasts in unknown environments conditioned on grid-based plans. arXiv preprint arXiv:2001.00735 (2020)

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  13. De Divitiis, L., Becattini, F., Baecchi, C., Del Bimbo, A.: Garment recommendation with memory augmented neural networks. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12662, pp. 282–295. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_23

    Chapter  Google Scholar 

  14. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  15. Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE (2021)

    Google Scholar 

  16. Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)

  17. Graves, A., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)

    Article  Google Scholar 

  18. Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2255–2264 (2018)

    Google Scholar 

  19. He, Z., Wildes, R.P.: Where are you heading? Dynamic trajectory prediction with expert goal examples. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  20. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)

    Article  Google Scholar 

  21. Ivanovic, B., Pavone, M.: The trajectron: probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2375–2384 (2019)

    Google Scholar 

  22. Kosaraju, V., Sadeghian, A., Martin-Martin, R., Reid, I., Rezatofighi, H., Savarese, S.: Social-bigat: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  23. Kothari, P., Kreiss, S., Alahi, A.: Human trajectory forecasting in crowds: a deep learning perspective. arXiv preprint arXiv:2007.03639 (2020)

  24. Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387 (2016)

    Google Scholar 

  25. Lee, N., et al.: Desire: distant future prediction in dynamic scenes with interacting agents. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 336–345 (2017)

    Google Scholar 

  26. Li, J., Yang, F., Tomizuka, M., Choi, C.: Evolvegraph: multi-agent trajectory prediction with dynamic relational reasoning. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  27. Liang, J., Jiang, L., Hauptmann, A.: SimAug: learning robust representations from simulation for trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 275–292. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_17

    Chapter  Google Scholar 

  28. Lisotto, M., Coscia, P., Ballan, L.: Social and scene-aware trajectory prediction in crowded spaces. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  29. Ma, C., et al.: Visual question answering with memory-augmented networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6975–6984 (2018)

    Google Scholar 

  30. Ma, Y., Zhu, X., Zhang, S., Yang, R., Wang, W., Manocha, D.: Trafficpredict: trajectory prediction for heterogeneous traffic-agents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6120–6127 (2019)

    Google Scholar 

  31. Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. arXiv preprint arXiv:2004.02025 (2020)

  32. Marchetti, F., Becattini, F., Seidenari, L., Del Bimbo, A.: Mantra: memory augmented networks for multiple trajectory prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  33. Marchetti, F., Becattini, F., Seidenari, L., Del Bimbo, A.: Multiple trajectory prediction of moving agents with memory augmented networks. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  34. Marchetti, F., Becattini, F., Seidenari, L., Del Bimbo, A.: Smemo: social memory for trajectory forecasting. arXiv preprint arXiv:2203.12446 (2022)

  35. Martins, A., Astudillo, R.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623. PMLR (2016)

    Google Scholar 

  36. Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14424–14432 (2020)

    Google Scholar 

  37. Pang, B., Zhao, T., Xie, X., Wu, Y.N.: Trajectory prediction with latent belief energy-based model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11814–11824 (2021)

    Google Scholar 

  38. Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 261–268. IEEE (2009)

    Google Scholar 

  39. Pernici, F., Bruni, M., Del Bimbo, A.: Self-supervised on-line cumulative learning from video streams. Comput. Vis. Image Underst. 197, 102983 (2020)

    Article  Google Scholar 

  40. Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: ICARL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)

    Google Scholar 

  41. Ridel, D., Deo, N., Wolf, D., Trivedi, M.: Scene compliant trajectory forecast with agent-centric spatio-temporal grids. IEEE Robot. Autom. Lett. 5(2), 2816–2823 (2020). https://doi.org/10.1109/LRA.2020.2974393

    Article  Google Scholar 

  42. Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33

    Chapter  Google Scholar 

  43. Sadeghian, A., Kosaraju, V., Gupta, A., Savarese, S., Alahi, A.: Trajnet: towards a benchmark for human trajectory prediction. arXiv preprint (2018)

    Google Scholar 

  44. Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: an attentive GAN for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1349–1358 (2019)

    Google Scholar 

  45. Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: multi-agent generative trajectory forecasting with heterogeneous data for control. arXiv preprint arXiv:2001.03093 (2020)

  46. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850 (2016)

    Google Scholar 

  47. Shafiee, N., Padir, T., Elhamifar, E.: Introvert: human trajectory prediction via conditional 3D attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16815–16825 (2021)

    Google Scholar 

  48. Shi, L., et al.: SGCN: sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8994–9003 (2021)

    Google Scholar 

  49. Srikanth, S., Ansari, J.A., Sharma, S., et al.: INFER: intermediate representations for future prediction. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019) (2019)

    Google Scholar 

  50. Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)

    Google Scholar 

  51. Sun, J., Li, Y., Fang, H.S., Lu, C.: Three steps to multimodal trajectory prediction: Modality clustering, classification and synthesis. arXiv preprint arXiv:2103.07854 (2021)

  52. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  53. Weston, J., Chopra, S., Bordes, A.: Memory networks. arXiv preprint arXiv:1410.3916 (2014)

  54. Xu, C., Mao, W., Zhang, W., Chen, S.: Remember intentions: retrospective-memory-based trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6488–6497 (2022)

    Google Scholar 

  55. Yuan, Y., Weng, X., Ou, Y., Kitani, K.: Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting. arXiv preprint arXiv:2103.14023 (2021)

  56. Zhao, H., et al.: TNT: target-driven trajectory prediction. arXiv abs/2008.08294 (2020)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the European Commission under European Horizon 2020 Programme, grant number 951911 - AI4Media.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Becattini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marchetti, F., Becattini, F., Seidenari, L., Del Bimbo, A. (2023). Explainable Sparse Attention for Memory-Based Trajectory Predictors. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13805. Springer, Cham. https://doi.org/10.1007/978-3-031-25072-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25072-9_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25071-2

  • Online ISBN: 978-3-031-25072-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics