Abstract
Human activity recognition has become one of the most active research topics in image processing and pattern recognition. Manual analysis of video is labour intensive, fatiguing, and error prone. Solving the problem of recognizing human activities from video can lead to improvements in several application fields like surveillance systems, human computer interfaces, sports video analysis, digital shopping assistants, video retrieval, gaming and health-care. This paper aims to recognize an action performed in a sequence of continuous actions recorded with a Kinect sensor based on the information about the position of the main skeleton joints. The typical approach is to use manually labeled data to perform supervised training. In this paper we propose a method to perform automatic temporal segmentation in order to separate the sequence in a set of actions. By measuring the amount of movement that occurs in each joint of the skeleton we are able to find temporal segments that represent the singular actions. We also proposed an automatic labeling method of human actions using a clustering algorithm on a subset of the available features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)
Bobick, A.F., Wilson, A.D.: A state-based approach to the representation and recognition of gesture. IEEE Trans. Pattern Anal. Mach. Intell. 19(12), 1325–1337 (1997)
Damen, D., Hogg, D.: Recognizing linked events: searching the space of feasible explanations. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 927–934 (2009). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5206636
Gavrila, D.: The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73(1), 82–98 (1999). http://www.sciencedirect.com/science/article/pii/S1077314298907160
Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: International Joint Conference on Artificial Intelligence, vol. 25, pp. 1351–1357 (2013)
Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2012–2019 (2009)
Hussein, M.E., Torki, M., Gowayyed, M.a., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International Joint Conference on Artificial Intelligence pp. 2466–2472 (2013)
Intille, S.S., Bobick, A.F.: A framework for recognizing multi-agent action from visual evidence. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, vol. 489, pp. 518–525 (1999). http://dl.acm.org/citation.cfm?id=315149.315381
Keller, C.G., Dang, T., Fritz, H., Joos, A., Rabe, C., Gavrila, D.M.: Active pedestrian safety by automatic braking and evasive steering. IEEE Trans. Intell. Transp. Syst. 12(4), 1292–1304 (2011). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5936735
Koppula, H., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013). arxiv:1210.1207v2
Nirjon, S., Greenwood, C., Torres, C., Zhou, S., Stankovic, J.a., Yoon, H.J., Ra, H.K., Basaran, C., Park, T., Son, S.H.: Kintense: A robust, accurate, real-time and evolving system for detecting aggressive actions from streaming 3D skeleton data. In: 2014 IEEE International Conference on Pervasive Computing and Communications, PerCom 2014 pp. 2–10 (2014). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6813937
Niu, W., Long, J., Han, D., Wang, Y.F.: Human activity detection and recognition for video surveillance. In: 2004 IEEE International Conference on Multimedia and Exp (ICME), vols. 1-3. pp. 719–722 (2004)
O’Rourke, J., Badler, N.: Model-based image analysis of human motion using constraint propagation. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 522–536 (1980). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6447699
Pinhanez, C.S., Bobick, A.F.: Human action detection using pnf propagation of temporal constraints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 898–904. IEEE (1998)
Popa, M., Kemal Koc, A., Rothkrantz, L.J.M., Shan, C., Wiggers, P.: Kinect sensing of shopping related actions. Commun. Comput. Inf. Sci. 277 CCIS, 91–100 (2012)
Rashid, R.F.: Towards a system for the interpretation of moving light displays. IEEE Trans. Pattern Anal. Mach. Intell. 2(6), 574–581 (1980). http://scholar.google.com/scholarhl=en&btnG=Search&q=intitle:Towards+a+System+for+the+Interpretation+of+Moving+Light+Displays#0
Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82(1), 1–24 (2009). http://link.springer.com/10.1007/s11263-008-0181-1
Starner, T., Weaver, J., Pentland, A.: Real-time american sign language recognition using desk and wearable computer based video. Trans. Pattern Anal. Mach. Intell. 20(466), 1371–1375 (1998). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=735811
Wolf, C., Mille, J., Lombardi, E., Celiktutan, O., Jiu, M., Dogan, E., Eren, G., Baccouche, M., Dellandrea, E., Bichot, C.E., Garcia, C., Sankur, B.: Evaluation of video activity localizations integrating quality and quantity measurements. Comput. Vis. Image Underst. 127, 14–30 (2014). http://liris.cnrs.fr/Documents/Liris-5498.pdf
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. Comput. Vis. Pattern Recognit. 379–385 (1992). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=223161, http://ieeexplore.ieee.org/ielx2/418/5817/00223161.pdf?tp=&arnumber=223161&isnumber=5817, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=223161&contentType=Conference+Publication
Yu, E., Aggarwal, J.K.: Detection of fence climbing from monocular video. In: 18th international conference on pattern recognition, vol. 1, pp. 375–378 (2006). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1698911
Zhou, F., Torre, F.D.L., Hodgins, J.: Hierarchical aligned cluster analysis (HACA) for temporal segmentation of human motion. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 1–40 (2010).http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Hierarchical+Aligned+Cluster+Analysis+(+HACA+)+for+Temporal+Segmentation+of+Human+Motion#1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jardim, D., Nunes, L., Dias, M.S. (2016). Automatic Human Activity Segmentation and Labeling in RGBD Videos. In: Czarnowski, I., Caballero, A., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2016. IDT 2016. Smart Innovation, Systems and Technologies, vol 56. Springer, Cham. https://doi.org/10.1007/978-3-319-39630-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-39630-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39629-3
Online ISBN: 978-3-319-39630-9
eBook Packages: EngineeringEngineering (R0)