Sign Gesture Recognition from Raw Skeleton Information in 3D Using Deep Learning

Rakesh, Sumit; Javed, Saleha; Saini, Rajkumar; Liwicki, Marcus

doi:10.1007/978-981-16-1092-9_16

Sumit Rakesh⁹,
Saleha Javed⁹,
Rajkumar Saini⁹ &
…
Marcus Liwicki⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1377))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

1388 Accesses

Abstract

Sign Language Recognition (SLR) minimizes the communication gap when interacting with hearing impaired people, i.e. connects hearing impaired persons and those who require to communicate and don’t understand SLR. This paper focuses on an end-to-end deep learning approach for the recognition of sign gestures recorded with a 3D sensor (e.g., Microsoft Kinect). Typical machine learning based SLR systems require feature extractions before applying machine learning models. These features need to be chosen carefully as the recognition performance heavily relies on them. Our proposed end-to-end approach eradicates this problem by eliminating the need to extract handmade features. Deep learning models can directly work on raw data and learn higher level representations (features) by themselves. To test our hypothesis, we have used two latest and promising deep learning models, Gated Recurrent Unit (GRU) and Bidirectional Long Short Term Memory (BiLSTM) and trained them using only raw data. We have performed comparative analysis among both models and also with the base paper results. Conducted experiments reflected that proposed method outperforms the existing work, where GRU successfully concluded with 70.78% average accuracy with front view training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://parimal.iitr.ac.in/dataset.

References

Cheng, Q., Mayberry, R.I.: Acquiring a first language in adolescence: the case of basic word order in American sign language. J. Child Lang. 46(2), 214–240 (2019)
Article Google Scholar
Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019)
Article Google Scholar
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078
Cui, Z., Ke, R., Wang, Y.: Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. CoRR abs/1801.02143 (2018). http://arxiv.org/abs/1801.02143
Elsayed, N., Maida, A.S., Bayoumi, M.: Deep gated recurrent and convolutional network hybrid model for univariate time series classification. arXiv preprint arXiv:1812.07683 (2018)
Gangrade, J., Bharti, J.: Real time sign language recognition using depth sensor. Int. J. Comput. Vis. Robot. 9(4), 329–339 (2019)
Article Google Scholar
Ghotkar, A.S., Kharate, G.K.: Dynamic hand gesture recognition and novel sentence interpretation algorithm for Indian sign language using Microsoft kinect sensor. J. Pattern Recogn. Res. 1, 24–38 (2015)
Google Scholar
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space Odyssey. arXiv e-prints arXiv:1503.04069, March 2015
Haidong, S., Junsheng, C., Hongkai, J., Yu, Y., Zhantao, W.: Enhanced deep gated recurrent unit and complex wavelet packet energy moment entropy for early fault prognosis of bearing. Knowl.-Based Syst. 188, 105022 (2020). https://doi.org/10.1016/j.knosys.2019.105022. http://www.sciencedirect.com/science/article/pii/S0950705119304289
Article Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015). http://arxiv.org/abs/1508.01991
Kovács, G., Szekrényes, I.: Applying neural network techniques for topic change detection in the HuComTech corpus. In: Hunyadi, L., Szekrényes, I. (eds.) The Temporal Structure of Multimodal Communication. ISRL, vol. 164, pp. 147–162. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-22895-8_8
Chapter Google Scholar
Kumar, P., Kaur, S.: Sign language generation system based on Indian sign language grammar. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(4), 1–26 (2020)
Article Google Scholar
Kumar, P., Gauba, H., Roy, P.P., Dogra, D.P.: Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recogn. Lett. 86, 1–8 (2017)
Article Google Scholar
Kumar, P., Roy, P.P., Dogra, D.P.: Independent Bayesian classifier combination based sign language recognition using facial expression. Inf. Sci. 428, 30–48 (2018)
Article MathSciNet Google Scholar
Kumar, P., Saini, R., Roy, P.P., Dogra, D.P.: A position and rotation invariant framework for sign language recognition (SLR) using Kinect. Multimedia Tools Appl. 77(7), 8823–8846 (2017). https://doi.org/10.1007/s11042-017-4776-9
Article Google Scholar
Liwicki, M., Graves, A., Fernàndez, S., Bunke, H., Schmidhuber, J.: A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, ICDAR 2007 (2007)
Google Scholar
Maaten, L.v.d., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar
Mehrotra, K., Godbole, A., Belhe, S.: Indian sign language recognition using Kinect sensor. In: Kamel, M., Campilho, A. (eds.) ICIAR 2015. LNCS, vol. 9164, pp. 528–535. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20801-5_59
Chapter Google Scholar
Rabiner, L.R., Lee, C.H., Juang, B., Wilpon, J.: HMM clustering for connected word recognition. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 405–408. IEEE (1989)
Google Scholar
Saini, R., Kumar, P., Kaur, B., Roy, P.P., Dogra, D.P., Santosh, K.: Kinect sensor-based interaction monitoring system using the BLSTM neural network in healthcare. Int. J. Mach. Learn. Cybern. 10(9), 2529–2540 (2019). https://doi.org/10.1007/s13042-018-0887-5
Article Google Scholar
SigOpt: Sigopt hyperparameter optimization. https://sigopt.com/product. Accessed 03 July 2020
Tang, X., Chen, Y., Dai, Y., Xu, J., Peng, D.: A multi-scale convolutional attention based GRU network for text classification. In: 2019 Chinese Automation Congress (CAC), pp. 3009–3013. IEEE (2019)
Google Scholar
Tolentino, L.K.S., Juan, R.O.S., Thio-ac, A.C., Pamahoy, M.A.B., Forteza, J.R.R., Garcia, X.J.O.: Static sign language recognition using deep learning. Int. J. Mach. Learn. Comput. 9(6), 821–827 (2019)
Article Google Scholar
Wario, R., Nyaga, C.: A survey of the constraints encountered in dynamic vision-based sign language hand gesture recognition. In: Antona, M., Stephanidis, C. (eds.) HCII 2019. LNCS, vol. 11573, pp. 373–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23563-5_30
Chapter Google Scholar
Wikipedia: Ok gesture. https://en.wikipedia.org/wiki/OK$_$gesture$#$cite$_$note-1. Accessed 04 July 2020
Zeshan, U., Vasishta, M.N., Sethna, M.: Implementation of Indian sign language in educational settings. Asia Pac. Disabil. Rehabil. J. 16(1), 16–40 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Sumit Rakesh, Saleha Javed, Rajkumar Saini & Marcus Liwicki

Authors

Sumit Rakesh
View author publications
You can also search for this author in PubMed Google Scholar
Saleha Javed
View author publications
You can also search for this author in PubMed Google Scholar
Rajkumar Saini
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Liwicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajkumar Saini .

Editor information

Editors and Affiliations

Indian Institute of Information Technology Allahabad, Prayagraj, India
Satish Kumar Singh
Indian Institute of Technology Roorkee, Roorkee, India
Partha Roy
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Information Technology Allahabad, Prayagraj, India
P. Nagabhushan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rakesh, S., Javed, S., Saini, R., Liwicki, M. (2021). Sign Gesture Recognition from Raw Skeleton Information in 3D Using Deep Learning. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_16

Download citation

DOI: https://doi.org/10.1007/978-981-16-1092-9_16
Published: 28 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1091-2
Online ISBN: 978-981-16-1092-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics