Skip to main content

Sign Gesture Recognition from Raw Skeleton Information in 3D Using Deep Learning

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1377))

Included in the following conference series:

  • 1388 Accesses

Abstract

Sign Language Recognition (SLR) minimizes the communication gap when interacting with hearing impaired people, i.e. connects hearing impaired persons and those who require to communicate and don’t understand SLR. This paper focuses on an end-to-end deep learning approach for the recognition of sign gestures recorded with a 3D sensor (e.g., Microsoft Kinect). Typical machine learning based SLR systems require feature extractions before applying machine learning models. These features need to be chosen carefully as the recognition performance heavily relies on them. Our proposed end-to-end approach eradicates this problem by eliminating the need to extract handmade features. Deep learning models can directly work on raw data and learn higher level representations (features) by themselves. To test our hypothesis, we have used two latest and promising deep learning models, Gated Recurrent Unit (GRU) and Bidirectional Long Short Term Memory (BiLSTM) and trained them using only raw data. We have performed comparative analysis among both models and also with the base paper results. Conducted experiments reflected that proposed method outperforms the existing work, where GRU successfully concluded with 70.78% average accuracy with front view training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://parimal.iitr.ac.in/dataset.

References

  1. Cheng, Q., Mayberry, R.I.: Acquiring a first language in adolescence: the case of basic word order in American sign language. J. Child Lang. 46(2), 214–240 (2019)

    Article  Google Scholar 

  2. Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019)

    Article  Google Scholar 

  3. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078

  4. Cui, Z., Ke, R., Wang, Y.: Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. CoRR abs/1801.02143 (2018). http://arxiv.org/abs/1801.02143

  5. Elsayed, N., Maida, A.S., Bayoumi, M.: Deep gated recurrent and convolutional network hybrid model for univariate time series classification. arXiv preprint arXiv:1812.07683 (2018)

  6. Gangrade, J., Bharti, J.: Real time sign language recognition using depth sensor. Int. J. Comput. Vis. Robot. 9(4), 329–339 (2019)

    Article  Google Scholar 

  7. Ghotkar, A.S., Kharate, G.K.: Dynamic hand gesture recognition and novel sentence interpretation algorithm for Indian sign language using Microsoft kinect sensor. J. Pattern Recogn. Res. 1, 24–38 (2015)

    Google Scholar 

  8. Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space Odyssey. arXiv e-prints arXiv:1503.04069, March 2015

  9. Haidong, S., Junsheng, C., Hongkai, J., Yu, Y., Zhantao, W.: Enhanced deep gated recurrent unit and complex wavelet packet energy moment entropy for early fault prognosis of bearing. Knowl.-Based Syst. 188, 105022 (2020). https://doi.org/10.1016/j.knosys.2019.105022. http://www.sciencedirect.com/science/article/pii/S0950705119304289

    Article  Google Scholar 

  10. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015). http://arxiv.org/abs/1508.01991

  11. Kovács, G., Szekrényes, I.: Applying neural network techniques for topic change detection in the HuComTech corpus. In: Hunyadi, L., Szekrényes, I. (eds.) The Temporal Structure of Multimodal Communication. ISRL, vol. 164, pp. 147–162. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-22895-8_8

    Chapter  Google Scholar 

  12. Kumar, P., Kaur, S.: Sign language generation system based on Indian sign language grammar. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(4), 1–26 (2020)

    Article  Google Scholar 

  13. Kumar, P., Gauba, H., Roy, P.P., Dogra, D.P.: Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recogn. Lett. 86, 1–8 (2017)

    Article  Google Scholar 

  14. Kumar, P., Roy, P.P., Dogra, D.P.: Independent Bayesian classifier combination based sign language recognition using facial expression. Inf. Sci. 428, 30–48 (2018)

    Article  MathSciNet  Google Scholar 

  15. Kumar, P., Saini, R., Roy, P.P., Dogra, D.P.: A position and rotation invariant framework for sign language recognition (SLR) using Kinect. Multimedia Tools Appl. 77(7), 8823–8846 (2017). https://doi.org/10.1007/s11042-017-4776-9

    Article  Google Scholar 

  16. Liwicki, M., Graves, A., Fernàndez, S., Bunke, H., Schmidhuber, J.: A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, ICDAR 2007 (2007)

    Google Scholar 

  17. Maaten, L.v.d., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  18. Mehrotra, K., Godbole, A., Belhe, S.: Indian sign language recognition using Kinect sensor. In: Kamel, M., Campilho, A. (eds.) ICIAR 2015. LNCS, vol. 9164, pp. 528–535. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20801-5_59

    Chapter  Google Scholar 

  19. Rabiner, L.R., Lee, C.H., Juang, B., Wilpon, J.: HMM clustering for connected word recognition. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 405–408. IEEE (1989)

    Google Scholar 

  20. Saini, R., Kumar, P., Kaur, B., Roy, P.P., Dogra, D.P., Santosh, K.: Kinect sensor-based interaction monitoring system using the BLSTM neural network in healthcare. Int. J. Mach. Learn. Cybern. 10(9), 2529–2540 (2019). https://doi.org/10.1007/s13042-018-0887-5

    Article  Google Scholar 

  21. SigOpt: Sigopt hyperparameter optimization. https://sigopt.com/product. Accessed 03 July 2020

  22. Tang, X., Chen, Y., Dai, Y., Xu, J., Peng, D.: A multi-scale convolutional attention based GRU network for text classification. In: 2019 Chinese Automation Congress (CAC), pp. 3009–3013. IEEE (2019)

    Google Scholar 

  23. Tolentino, L.K.S., Juan, R.O.S., Thio-ac, A.C., Pamahoy, M.A.B., Forteza, J.R.R., Garcia, X.J.O.: Static sign language recognition using deep learning. Int. J. Mach. Learn. Comput. 9(6), 821–827 (2019)

    Article  Google Scholar 

  24. Wario, R., Nyaga, C.: A survey of the constraints encountered in dynamic vision-based sign language hand gesture recognition. In: Antona, M., Stephanidis, C. (eds.) HCII 2019. LNCS, vol. 11573, pp. 373–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23563-5_30

    Chapter  Google Scholar 

  25. Wikipedia: Ok gesture. https://en.wikipedia.org/wiki/OK$_$gesture$#$cite$_$note-1. Accessed 04 July 2020

  26. Zeshan, U., Vasishta, M.N., Sethna, M.: Implementation of Indian sign language in educational settings. Asia Pac. Disabil. Rehabil. J. 16(1), 16–40 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajkumar Saini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rakesh, S., Javed, S., Saini, R., Liwicki, M. (2021). Sign Gesture Recognition from Raw Skeleton Information in 3D Using Deep Learning. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1092-9_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1091-2

  • Online ISBN: 978-981-16-1092-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics