Abstract
Speech is the most natural form of communication for human beings, and among others, it provides information about the speaker’s emotional state. The current study focuses on automatic speech emotion recognition based on classic and innovated machine learning approaches using simulated emotional speech data. Specifically, individual Gaussian mixture models (GMM) trained for each emotion, a universal background GMM model (UBM-GMM) adapted to each emotion using maximum posteriori (MAP) adaptation, and an approach based on i-vector paradigm, widely used in speaker recognition and language identification, and adapted to emotion recognition are used. When using individual GMMs, a novel technique based on multiple classifiers and late fusion is also applied. In this case, a 90.9% recognition rate is been obtained. When the state-of-the-art, i-vector paradigm based method, along with probabilistic linear discriminant analysis (PLDA) model is used, a 91.4% average rate for speaker-independent Japanese speech emotion recognition is achieved, which is a very promising result and superior to similar studies. In addition to the Japanese emotion recognition, pair-wise recognition for seven emotions in German language has also been conducted. The recognition rates obtained using the German database show the same tendency as in Japanese. In this experiment, an 89.2% average rate has been achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Busso, C., Bulut, M., Narayanan, S.: Toward effective automatic recognition systems of emotion in speech. In: Gratch, J., Marsella, S. (eds.) Social Emotions in Nature and Artifact: Emotions in Human and Human-Computer Interaction, pp. 110–127. Oxford University Press, New York (2013)
Tang, H., Chu, S., Johnson, M.H.: Emotion recognition from speech via boosted Gaussian mixture models. In Proceedings of ICME, pp. 294–297 (2009)
Xu, S., Liu, Y., Liu, X.: Speaker recognition and speech emotion recognition based on GMM. In: 3rd International Conference on Electric and Electronics (EEIC 2013), pp. 434–436 (2013)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: Proceedings of IEEE ICASSP, vol. I, pp. 401–404 (2003)
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)
Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. NCA 9(4), 290296 (2000)
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of Interspeech, pp. 223–227 (2014)
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Fosler-Lussier, E., Amdal, I., Kuo, H.: A framework for predicting speech recognition errors. Speech Commun. 46, 153–170 (2005)
Silva, J., Narayanan, S.: Average divergence distance as a statistical discrimination measure for hidden Markov models. IEEE Trans. Speech Audio Process. 14, 890–906 (2006)
Yamamoto, K., Nakagawa, S.: Differences of speech rate, interphoneme distance and likelihood caused by speaking style, their relationship and recognition performance. Syst. Comput. Jpn 33(7), 50–60 (2002)
Sahidullah, M., Saha, G.: Design analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543565 (2012)
O’Shaughnessy, D.: Linear predictive coding. IEEE Potentials 7(1), 29–32 (1988)
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. AcousL Soc. Am. 87(4), 1738–1752 (1990)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In Proceedings of Interspeech, pp. 1517–1520 (2005)
Juang, B.H., Rabiner, L.: A probabilistic distance measure for hidden Markov models. AT&T Tech. J. 391–408 (1985)
Metallinou, A., Lee, S., Narayanan, S.: Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: Proceedings of ICASSP, pp. 2462–2465 (2019)
Prince, S., Elder, J.: Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of International Conference on Computer Vision, pp. 1–8 (2007)
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proceedings of ICSLP, pp. 889–892 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Heracleous, P., Ishikawa, A., Yasuda, K., Kawashima, H., Sugaya, F., Hashimoto, M. (2018). Machine Learning Approaches for Speech Emotion Recognition: Classic and Novel Advances. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-77116-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)