Skip to main content
Log in

Learning discriminative context models for concurrent collective activity recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Collective activity classification is the task to identify activities with multiple persons participation, which often involves the context information like person relationships and person interactions. Most existing approaches assume that all individuals in a single image share the same activity label. However, in many cases, multiple activities co-exist and serve as context cues for each other in real-world scenarios. Based on this observation, in this paper, a unified discriminative learning framework of multiple context models is proposed for concurrent collective activity recognition. Firstly, both the intra-class and inter-class behaviour interactions among persons in a scenario are considered. Besides, the scenario where activities happen also provides additional context information for recognizing specific collective activities. Finally, we jointly model the multiple context cues (intra-class, inter-class and global-context) with a max-margin leaning framework. A greedy forward search method is utilized to label the activities in the testing scenes. Experimental results demonstrate the superiority of our approach in activity recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Amer MR, Xie D, Zhao M, Todorovic S, Zhu SC (2012) Cost-sensitive top-down / bottom-up inference for multiscale activity recognition. In: ECCV

  2. Antic B, Ommer B (2014) Learning latent constituents for recognition of group activities in video. In: European Conference on Computer Vision (ECCV)

  3. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: 10th IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol 2, pp 1395–1402

  4. Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2:27:1–27:27

    Article  Google Scholar 

  5. Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision (ECCV)

  6. Choi W, Shahid K, Savarese S (2009) What are they doing? : Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp 1282–1289

  7. Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3280

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893

  9. Desai C, Ramanan D, Fowlkes CC (2011) Discriminative models for multi-class object layout. Int J Comput Vis 95(1):1–12

    Article  MathSciNet  MATH  Google Scholar 

  10. Fu W, Zhao C, Wang J, Liu J, Cheng J, Lu H (2015) Concurrent group activity classification with context modeling. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. ACM, p 9

  11. Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2012–2019

  12. Han D, Bo L, Sminchisescu C (2009) Selection and context for action recognition. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1933–1940

  13. Jain A, Gupta A, Davis LS (2010) Learning what and how of contextual models for scene labeling. In: Computer Vision–ECCV 2010. Springer, pp 199–212

  14. Kjellström H, Romero J, Martínez D, Kragić D (2008) Simultaneous visual recognition of manipulation actions and manipulated objects. In: Computer Vision–ECCV 2008. Springer, pp 336–349

  15. Lan T, Yang W, Wang Y, Mori G (2010) Beyond actions: Discriminative models for contextual group activities. In: In Advances in Neural Information Processing Systems

  16. Lan T, Sigal L, Mori G (2012a) Social roles in hierarchical models for human activity recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1354–1361

  17. Lan T, Wang Y, Mori G, Robinovitch SN (2012b) Retrieving actions in group contexts. In: Trends and Topics in Computer Vision. Springer, pp 181–194

  18. Lan T, Wang Y, Yang W, Robinovitch S, Mori G (2012c) Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(8):1549–1562

    Article  Google Scholar 

  19. Li R, Porfilio P, Zickler T (2013) Finding group interactions in social clutter. In: CVPR

  20. Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2929–2936

  21. Murphy K, Torralba A, Freeman W (2003) Using the forest to see the trees: a graphical model relating features, objects and scenes. Advances in neural information processing systems 16:1499–1506

    Google Scholar 

  22. Odashima S, Shimosaka M, Kaneko T (2012) Collective activity localization with contextual spatial pyramid. In: European Conference on Computer Vision (ECCV)

  23. Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: IEEE 11th international conference on Computer vision, 2007. ICCV 2007. IEEE, pp 1–8

  24. Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1593–1600

  25. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36

  26. Torralba A, Murphy K, Freeman W, Rubin M (2003) Context-based vision system for place and object recognition. In: Proceedings of the 9th IEEE International Conference on Computer Vision, 2003, vol 1, pp 273–280

  27. Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st international conference on Machine learning. ACM, p 104

  28. Wang J, Wang B, Duan L, Tian Q, Lu H (2014) Interactive ads recommendation with contextual search on product topic space. Multimedia tools and applications 70(2):799–820

    Article  Google Scholar 

  29. Wongun C, Silvio S (2013) Understanding collective activities of people from videos

  30. Yao B, Fei-Fei L (2010a) Grouplet: A structured image representation for recognizing human and object interactions. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 9–16

  31. Yao B, Fei-Fei L (2010b) Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 17–24

  32. Zhao C, Fu W, Wang J, Bai X, Liu Q, Lu H (2014) Discriminative context models for collective activity recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 648–653

  33. Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware modeling and recognition of activities in video. CVPR

Download references

Acknowledgments

This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaoyang Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, C., Wang, J. & Lu, H. Learning discriminative context models for concurrent collective activity recognition. Multimed Tools Appl 76, 7401–7420 (2017). https://doi.org/10.1007/s11042-016-3393-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3393-3

Keywords

Navigation