Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario

Muhammad, Ghulam; Masud, Mehedi; Alelaiwi, Abdulhameed; Rahman, Md. Abdur; Karime, Ali; Alamri, Atif; Hossain, M. Shamim

doi:10.1007/s11042-014-1973-7

Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario

Published: 02 May 2014

Volume 74, pages 5313–5327, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ghulam Muhammad¹,
Mehedi Masud²,
Abdulhameed Alelaiwi³,
Md. Abdur Rahman⁴,
Ali Karime⁵,
Atif Alamri⁶ &
…
M. Shamim Hossain³

320 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Speech is one of the important modalities in a serious game platform. Serious game can be very useful for the rehabilitation of individuals with voice disorders. Therefore, we need an efficient and high-performance automatic speech recognition (ASR) system. In this paper, we propose a spectro-temporal directional derivative (STDD) feature that requires less number of computations in the modeling and yet gives high recognition accuracy in the ASR system. The proposed STDD feature is achieved by applying different directional derivative filters in the spectro-temporal domain. The feature dimension is then compressed by discrete cosine transform. The experiments are performed with voice samples of Arabic numerals spoken by persons with and without voice pathology. The experimental results show that the STDD feature outperforms the conventional mel-frequency cepstral coefficients both in clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative features based on modified log magnitude spectrum for playback speech detection

Article Open access 07 April 2020

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Article Open access 04 August 2021

Multimedia analysis for disguised voice and classification efficiency

Article 01 October 2018

References

Abe S (2005) Support vector machines for pattern classification. Springer, Berlin
Google Scholar
Abt CC (1970) Serious games. Viking Press, New York, p 9
Google Scholar
Arias-Londoño JD, Godino-Llorente JI, Sáenz-Lechón N, Osma-Ruiz V (2010) An improved method for voice pathology detection by means of a HMM-based feature space transformation. J Pattern Recog 43(9):3100–3112
Article MATH Google Scholar
Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and recognition. J Acoust Soc Am 54(6):1304–1312
Article Google Scholar
Barab S, Thomas M, Dodge T, Carteaux R, Tuzun H (2005) Making learning fun: quest Atlantis a game without guns. Educ Technol Res Dev 53:86–107
Article Google Scholar
Batliner A, Steidi S, Hacker C, Noth E (2008) Private emotions versus social interaction: a data-driven approach towards analyzing emotion in speech. User Model User-Adap Inter 18:175–206
Article Google Scholar
Bergeron B (2008) Learning and retention in adaptive serious games. Stud Health Technol Inf 132:26–30
Google Scholar
Botella C, Villa H, Garcia P, Quero S, Banos R, Alcaniz M (2004) The use of VR in the treatment of panic disorders and agoraphobia. Stud Health Technol Inf 99:73–90
Google Scholar
Boyanov B, Hadjitodorov S (1997) Acoustic analysis of pathological voices. IEEE Eng Med Biol Mag 16:74–82
Article Google Scholar
Costa SC, Aguiar Neto BG, Fechine JM (2008) Pathological voice discrimination using cepstral analysis, vector quantization and hidden Markov models. Proceedings of 8th IEEE International Conference on BioInformatics and BioEngineering, BIBE, pp. 1–5
Cowie R, Douglas-Cowie E, Tsapatsoulis N et al (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18:32–80
Article Google Scholar
Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Proceedings of 2nd Joint Conference of EMBS/BMES, vol. 1, Houston, TX, USA
Duda RO, Hart PE, Strork HG (2000) Pattern classification. Wiley-Interscience, NY
Google Scholar
Fernandez-Aranda F, Jimenez-Murcia S, Santamaria JJ et al (2012) Video games as a complementary therapy tool in mental disorders: PlayMancer, a European multicentre study. J Ment Health 21(4):364–374
Article Google Scholar
Godino-Llorente JI, Gomez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51:380–384
Article Google Scholar
Godino-Llorente JI, Fraile R, Saenz-Lechon N, Osma-Ruiz V, Gomez-Vilda P (2009) Automatic detection of voice impairments from text-dependent running speech. Biomed Sig Process Control 4:176–182
Article Google Scholar
Hadjitodorov S, Boyanov B, Teston B (2000) Laryngeal pathology detection by means of class-specific neural maps. IEEE Trans Inf Technol Biomed 4:68–73
Article Google Scholar
Marinaki M, Kotropoulos C, Pitas I, MaglaverasN (2004) Automatic detection of vocal fold paralysis and edema. Proceedings of ICSLP’04, Jeju Island, South Korea
Markaki M, Stylianou Y (2011) Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 19(7):1938–1948
Article Google Scholar
Moore BCJ (1997) An introduction to the psychology of hearing, 4th edn. Academic Press, London
Google Scholar
Muhammad G, Melhem M (2014) Voice pathology detection and binary classification using MPEG-7 audio features. Biomed Sig Process Controls. doi:10.1016/j.bspc.2014.02.001
Google Scholar
Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M, Bukhari M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. Biomed Eng Online 10:41
Article Google Scholar
Muhammad G, Mesallam TA, Almalki KH, Farahat M, Mahmood A, Alsulaiman M (2012) Multi Directional Regression (MDR) based features for automatic voice disorder detection. J Voice Elsevier 26(6):817.e19–817.e27. doi:10.1016/j.jvoice.2012.05.002
Article Google Scholar
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs
Google Scholar
Santamaria JJ, Soto A, Fernandez-Aranda F, Krug I, Forcano L, Kalapanidas E, Gunnard K, Lam T, Raguin T, Davarakis C, Menchon JM, Jimenez-Murcia S (2011) Serious games as additional psychological support: a review of the literature. Cyberpsychol Behav Ther 4:469–476
Google Scholar
Schuller B, Steidl S, Batliner A (2010) The Interspeech 2010 Paralinguistic Challenge. Proc. Interspeech 2010, pp. 2794–2797

Download references

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia for funding this work through the research group project No RGP-VPP-228.

Author information

Authors and Affiliations

Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Ghulam Muhammad
Department of Computer Science, Taif University, Taif, Saudi Arabia
Mehedi Masud
Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Abdulhameed Alelaiwi & M. Shamim Hossain
Department of Computer Science, Umm Al-Qura University, Makkah, Saudi Arabia
Md. Abdur Rahman
Multimedia Communications Research Laboratory, University of Ottawa, Ottawa, Ontario, Canada
Ali Karime
Department of Information System, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Atif Alamri

Authors

Ghulam Muhammad
View author publications
You can also search for this author in PubMed Google Scholar
Mehedi Masud
View author publications
You can also search for this author in PubMed Google Scholar
Abdulhameed Alelaiwi
View author publications
You can also search for this author in PubMed Google Scholar
Md. Abdur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Ali Karime
View author publications
You can also search for this author in PubMed Google Scholar
Atif Alamri
View author publications
You can also search for this author in PubMed Google Scholar
M. Shamim Hossain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ghulam Muhammad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Muhammad, G., Masud, M., Alelaiwi, A. et al. Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario. Multimed Tools Appl 74, 5313–5327 (2015). https://doi.org/10.1007/s11042-014-1973-7

Download citation

Published: 02 May 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11042-014-1973-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario

Abstract

Access this article

Similar content being viewed by others

Discriminative features based on modified log magnitude spectrum for playback speech detection

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Multimedia analysis for disguised voice and classification efficiency

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario

Abstract

Access this article

Similar content being viewed by others

Discriminative features based on modified log magnitude spectrum for playback speech detection

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Multimedia analysis for disguised voice and classification efficiency

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation