Video content categorization using the double decomposition

Du, Youtian; Chen, Feng; Xu, Wenli; Qian, Xueming

doi:10.1007/s11042-012-1213-y

Video content categorization using the double decomposition

Published: 08 September 2012

Volume 66, pages 545–572, (2013)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Youtian Du¹,
Feng Chen²,
Wenli Xu² &
…
Xueming Qian³

332 Accesses
2 Citations
Explore all metrics

Abstract

Video contents contain complex structures due to the variety of the components and events involved. For example, surveillance videos often record multi-object interactions and consist of various scales of motion detail; Web videos are composed of multimodal cues, and each cue generally consists of a variety of scales of information. Generally, video contents comprise two types of the combination of the inherent structures: multi-modality/multi-scale and multi-object /multi-scale. Therefore, in this paper, we propose a new framework for video content modeling, under which video contents are decomposed into multiple interacting processes by double decomposition that aims at each type of combination of structures. To model the resulting processes, we propose a method named double-decomposed hidden Markov models (DDHMMs). DDHMMs contain multiple state chains that correspond to the interacting processes. To make the switching frequency of states in each chain consistent with the scale of the corresponding process, a durational state variable is introduced in DDHMMs. The proposed method performs well in modeling the relations among the interacting processes and the dynamics of each. We discuss the appropriate features under the proposed framework and evaluate DDHMMs in two applications, human motion recognition and web video categorization. The experimental results demonstrate that the double decomposition enhances video categorization performance in both cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Article 25 September 2020

Video shot-boundary detection: issues, challenges and solutions

Article Open access 30 March 2024

Video Summarization with Long Short-Term Memory

References

Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 994–999
Brezeale D, Cook DJ (2008) Automatic video classification: a survey of the literature. IEEE Trans Syst Man Cybern C 38:416–430
Article Google Scholar
Chen C, Liang J, Zhu X (2011) Gait recognition based on improved dynamic Bayesian networks. Pattern Recogn 44:988–995
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 886–893
Duong TV, Bui HH, Phung DQ, Venkatesh S (2005) Activity recognition and abnormality detection with the switching hidden semi-Markov model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 838–845
Fine S, Singer Y, Tishby N (1998) The hierarchical hidden Markov model: analysis and applications. Mach Learn 32:41–62
Article MATH Google Scholar
Forney GD (1973) The Viterbi algorithm. P IEEE 61:268–278
Article MathSciNet Google Scholar
Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29:245–273
Article MATH Google Scholar
Gu J, Ding X, Wang S, Wu Y (2010) Action and gait recognition from recovered 3-D human joints. IEEE Trans Syst Man Cybern B 40:1021–1033
Article Google Scholar
Huang CL, Shih HC, Chao CY (2006) Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans Multimedia 8:749–760
Article Google Scholar
Junejo IN (2010) Using dynamic Bayesian network for scene modeling and anomaly detection. Signal Image Video P 4:1–10
Article MATH Google Scholar
Liu X, Chua CS (2006) Multi-agent activity recognition using observation decomposed hidden Markov models. Image Vis Comput 24:166–175
Article MATH Google Scholar
Liu Y, Wu F (2009) Multi-modality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109
Article Google Scholar
Manohar V, Tsakalidis S, Natarajan P, et al (2011) Audio-visual fusion using bayesian model combination for web video retrieval. In: Proceddings of ACM conference on multimedia, pp 1537–1540
Mitchell C, Harper M, Jamieson L (1999) On the complexity of explicit duration HMMs. IEEE Trans Speech Audio Process 3(3):213–217
Article Google Scholar
Murphy KP (2002) Dynamic Bayesian network: representation, inference and learning. Ph.D Thesis, University of California, Berkeley
Natarajan P, Nevatia R (2007) Coupled hidden semi-Markov models for activity recognition. In: Proceedings of IEEE workshop on motion and video computing, pp 10–17
Nefian AV, Liang L, Pi X, et al (2002) A coupled HMM for audio-visual speech recognition. In: Proceedings of ICASSP, pp 2013–2016
Niebles JC, Chen C, Li F (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: Proceddings of ECCV, pp 392–405
Oliver N, Garg A, Horvitz E (2004) Layered representations for learning and inferring office activity from multiple sensory channels. Comput Vis Image Underst 96(2):163–180
Article Google Scholar
Roach MJ, Mason JSD, Pawlewski M (2001) Video genre classification using dynamics. In: Proceedings of ICASSP, pp 1557–1560
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In: Proceedings of ACM international conference on multimedia, pp 399–402
Tan BT, Fu M, Spray A, Dermody P (1996) The use of wavelet transforms in phoneme recognition. In: Proceedings of international conference on spoken language, pp 2431–2434
Wang M, Hua X, Yuan X, Song Y, et al (2007) Optimizing multi-graph learning: towards a unified video annotation scheme. In: Proceedings of ACM international conference on multimedia, pp 862–871
Wang L, Zhou H, Low S, Leckie C (2009) Action recognition via multi-feature fusion and gaussian process classification. In: Proceedings of workshop on applications of computer vision, pp 1–6
Wu Y, Chang EY, Chang KCC, Smith JR (2004) Optimal multimodal fusion for multimedia data analysis. In: Proceedings of ACM international conference on multimedia, pp 572–579
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using Hidden markov model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 379–385

Download references

Acknowledgements

The research presented in this paper is supported in part by the National Natural Science Foundation (60905018, 60903121, 61173109, 61175039), Key Projects in the National Science & Technology Pillar Program (2011BAK08B02), Research Fund for Doctoral Program of Higher Education (20090201120032), Fundamental Research Funds for the Central Universities (xjj2009041, xjj20100051), of China. The authors would like to thank the video team at United Technologies Research Center (UTRC) for their pertinent and constructive discussion, and thank Dr. K.P. Murphy for his Matlab Bnet toolbox. Also, the authors would like to thank all the anonymous reviewers for their constructive advices.

Author information

Authors and Affiliations

Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, 710049, China
Youtian Du
Department of Automation, Tsinghua University, Beijing, 100084, China
Feng Chen & Wenli Xu
School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710049, China
Xueming Qian

Authors

Youtian Du
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenli Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xueming Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueming Qian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, Y., Chen, F., Xu, W. et al. Video content categorization using the double decomposition. Multimed Tools Appl 66, 545–572 (2013). https://doi.org/10.1007/s11042-012-1213-y

Download citation

Published: 08 September 2012
Issue Date: October 2013
DOI: https://doi.org/10.1007/s11042-012-1213-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video content categorization using the double decomposition

Abstract

Access this article

Similar content being viewed by others

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Video shot-boundary detection: issues, challenges and solutions

Video Summarization with Long Short-Term Memory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video content categorization using the double decomposition

Abstract

Access this article

Similar content being viewed by others

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Video shot-boundary detection: issues, challenges and solutions

Video Summarization with Long Short-Term Memory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation