Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier

Wang, Shuang; Hou, Yonghong; Li, Zhaoyang; Dong, Jiarong; Tang, Chang

doi:10.1007/s11042-017-5335-0

Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier

Published: 03 November 2017

Volume 77, pages 18983–18998, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shuang Wang¹,
Yonghong Hou¹,
Zhaoyang Li¹,
Jiarong Dong¹ &
…
Chang Tang²

434 Accesses
16 Citations
Explore all metrics

Abstract

This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information. Specifically, depth and skeleton data are firstly augmented for deep learning and making the recognition insensitive to view variance. Secondly, depth sequences are segmented using the handcrafted features based on skeleton joints motion histogram to exploit the local temporal information. All training segments are clustered using an Infinite Gaussian Mixture Model (IGMM) through Bayesian estimation and labelled for training Convolutional Neural Networks (ConvNets) on the depth maps. Thus, a depth sequence can be reliably encoded into a sequence of segment labels. Finally, the sequence of labels is fed into a joint Hidden Markov Model and Support Vector Machine (HMM-SVM) classifier to explore the global temporal information for final recognition. The proposed framework was evaluated on the widely used MSRAction-Pairs, MSRDailyActivity3D and UTD-MHAD datasets and achieved promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

References

Althloothi S, Mahoor MH, Zhang X, Voyles RM (2014) Human activity recognition using multi-features and multiple kernel learning. Pattern Recogn 47(5):1800–1812
Article Google Scholar
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP), pp 168–172
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE International conference on computer vision, pp 1110–1118
Eweiwi A, Cheema MS, Bauckhage C, Gall J (2014) Efficient pose-based action recognition. In: ACCV, pp 428–443
Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181
Article MathSciNet MATH Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB Bayesian data analysis, 2nd edn. Crc Pr I Llc
Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (hod): describing trajectories of human joints for action recognition. In: International joint conference on artificial intelligence, pp 1351–1357
Griffiths T, Ghahramani Z (2005) Infinite latent feature models and the indian buffet process. Adv Neural Inf Process Syst 18:475–482
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long JC (2014) Convolutional architecture for fast feature embedding. Eprint Arxiv 675–678
Kong Y, Fu Y (2015) Bilinear heterogeneous information machine for rgb-d action recognition. In: CVPR, pp 1054–1062
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):2012
Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE Computer society conference on computer vision and pattern recognition workshops (CVPRW), pp 9–14
Neal RM (2010) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(9):249–265
MathSciNet Google Scholar
Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: IEEE Conference on computer vision and pattern recognition, pp 716–723
Shao L, Ji L (2009) Motion histogram analysis based key frame extraction for human action/activity representation. In: Canadian conference on computer and robot vision, pp 88–92
Shotton J, Fitzgibbon A, Cook M, Sharp T (2011) Real-time human pose recognition in parts from single depth images. In: Computer vision and pattern recognition, pp 1297–1304
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: IEEE Conference on computer vision and pattern recognition, pp 588–595
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on computer vision and pattern recognition, pp 1290–1297
Wang J, Liu Z, Wu Y (2014) Learning actionlet ensemble for 3d human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927
Article Google Scholar
Wang P, Li W, Ogunbona P, Gao Z (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: International conference on digital lmage computing: techniques and applications, pp 1 – 8
Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona P (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122
Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2015) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Human-Mach Syst 46(4):498–509
Article Google Scholar
Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Human-Mach Syst 46(4):498–509
Article Google Scholar
Wood F, Goldwater S, Black MJ (2006) A non-parametric bayesian approach to spike sorting 1(1):1165–1168
Google Scholar
Wu D, Shao L (2014) Deep dynamic neural networks for gesture segmentation and recognition. Springer International Publishing
Xia L, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer vision and pattern recognition, pp 2834–2841
Xiaodong Y, Tian YL (2012) Eigenjoints-based action recognition using naïve-bayes-nearest-neighbor. In: Computer vision and pattern recognition workshops, pp 14–19
Yang X, Tian YL (2014) Super normal vector for activity recognition using depth sequences. In: IEEE Conference on computer vision and pattern recognition, pp 804–811
Yang X, Zhang C, Tian YL (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM International conference on multimedia, pp 1057–1060
Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: IEEE International conference on computer vision, pp 2752–2759
Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended lc-ksvd for action recognition. In: DICTA. IEEE

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (NO. 61571325 and 61502357) and the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (NO. CUG170654).

Author information

Authors and Affiliations

School of Electronic Information Engineering, Tianjin University, Tianjin, 300072, People’s Republic of China
Shuang Wang, Yonghong Hou, Zhaoyang Li & Jiarong Dong
School of Computer Science, China University of Geosciences, Wuhan, 430074, People’s Republic of China
Chang Tang

Authors

Shuang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Hou
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiarong Dong
View author publications
You can also search for this author in PubMed Google Scholar
Chang Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chang Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Hou, Y., Li, Z. et al. Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier. Multimed Tools Appl 77, 18983–18998 (2018). https://doi.org/10.1007/s11042-017-5335-0

Download citation

Received: 12 January 2017
Revised: 06 August 2017
Accepted: 20 October 2017
Published: 03 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11042-017-5335-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

Human activity recognition in artificial intelligence framework: a narrative review

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

Human activity recognition in artificial intelligence framework: a narrative review

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation