Deep Residual Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Khamsehashari, R.; Gadzicki, K.; Zetzsche, C.

doi:10.1007/978-3-030-34995-0_34

R. Khamsehashari¹²,
K. Gadzicki¹² &
C. Zetzsche¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11754))

Included in the following conference series:

International Conference on Computer Vision Systems

2639 Accesses
1 Citations

Abstract

Deep residual networks for action recognition based on skeleton data can avoid the degradation problem, and a 56-layer Res-Net has recently achieved good results. Since a much “shallower” 11-layer model (Res-TCN) with a temporal convolution network and a simplified residual unit achieved almost competitive performance, we investigate deep variants of Res-TCN and compare them to Res-Net architectures. Our results outperform the other approaches in this class of residual networks. Our investigation suggests that the resistance of deep residual networks to degradation is not only determined by the architecture but also by data and task properties.

This work has been supported by the German Aerospace Center (DLR) with financial means of the German Federal Ministry for Economic Affairs and Energy (BMWi), project “OPA³L” (grant No. 50 NA 1909) and by the German Research Foundation DFG, as part of CRC (Sonderforschungsbereich) 1320 “EASE - Everyday Activity Science and Engineering”, University of Bremen (http://www.ease-crc.org/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint. arXiv:1502.03167 (2015)
Kim, T.S., Reiter, A.: Interpretable 3D human action analysis with temporal convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)
Google Scholar
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2017)
Google Scholar
Li, H., Xu, Z., Taylor, G., Goldstein, T.: Visualizing the loss landscape of neural nets. In: CoRR. arXiv:1712.09913 (2017)
Pham, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.: Exploiting deep residual networks for human action recognition from skeletal data. Comput. Vis. Image Underst. (CVIU) 170, 51–66 (2018)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Yang, Z., Li, Y., Yang, J., Luo, J.: Action recognition with visual attention on skeleton images. In: CoRR. arXiv:1804.07453 (2018)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)
Article Google Scholar
Zhu, J., et al.: Action machine: rethinking action recognition in trimmed videos. In: CoRR. arXiv:1812.05770 (2019)
Rasouli, A., Tsotsos, J.K.: Joint attention in driver-pedestrian interaction: from theory to practice. In: CoRR. arXiv:1802.02522 (2018)
Liu, M., Hong, L., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)
Article Google Scholar
Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE (2017)
Google Scholar
Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE (2017)
Google Scholar
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive Neuroinformatics, University of Bremen, Bremen, Germany
R. Khamsehashari, K. Gadzicki & C. Zetzsche

Authors

R. Khamsehashari
View author publications
You can also search for this author in PubMed Google Scholar
K. Gadzicki
View author publications
You can also search for this author in PubMed Google Scholar
C. Zetzsche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Khamsehashari .

Editor information

Editors and Affiliations

Centre for Research and Technology Hellas (CERTH-ITI), Thessaloniki, Greece
Dimitrios Tzovaras
Centre for Research and Technology Hellas (CERTH-ITI), Thessaloniki, Greece
Dimitrios Giakoumis
Vienna University of Technology, Vienna, Austria
Markus Vincze
Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece
Antonis Argyros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khamsehashari, R., Gadzicki, K., Zetzsche, C. (2019). Deep Residual Temporal Convolutional Networks for Skeleton-Based Human Action Recognition. In: Tzovaras, D., Giakoumis, D., Vincze, M., Argyros, A. (eds) Computer Vision Systems. ICVS 2019. Lecture Notes in Computer Science(), vol 11754. Springer, Cham. https://doi.org/10.1007/978-3-030-34995-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-34995-0_34
Published: 23 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34994-3
Online ISBN: 978-3-030-34995-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics