Abstract
Activity Recognition from RGB-D videos is still an open problem due to the presence of large varieties of actions. In this work, we propose a new architecture by mixing a high level handcrafted strategy and machine learning techniques. We propose a novel two level fusion strategy to combine features from different cues to address the problem of large variety of actions. As similar actions are common in daily living activities, we also propose a mechanism for similar action discrimination. We validate our approach on four public datasets, CAD-60, CAD-120, MSRDailyActivity3D, and NTU-RGB+D improving the state-of-the-art results on them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org. https://www.tensorflow.org/
Baradel, F., Wolf, C., Mille, J.: Human action recognition: pose-based attention draws focus to hands. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 604–613, October 2017
Baradel, F., Wolf, C., Mille, J., Taylor, G.W.: Glimpse clouds: human activity recognition from unstructured feature points. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE (2017)
Cheron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Das, S., Koperski, M., Bremond, F., Francesca, G.: A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition. ArXiv e-prints, February 2018
Das, S., Koperski, M., Bremond, F., Francesca, G.: Action recognition based on a mixture of RGB and depth based skeleton. In: AVSS (2017)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Koperski, M.: Human action recognition in videos with local representation. Ph.D. thesis, University COTE D’AZUR (2017)
Koperski, M., Bremond, F.: Modeling spatial layout of features for real world scenario RGB-D action recognition. In: AVSS (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
van der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE (2008). https://lvdmaaten.github.io/tsne/
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: ICRA (2012)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176, June 2011
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, Australia, Sydney (2013)
Wu, Y.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 148–157, March 2017
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2923–2932. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Das, S., Thonnat, M., Sakhalkar, K., Koperski, M., Bremond, F., Francesca, G. (2019). A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-05716-9_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)