Abstract
This chapter summarises the methods presented for automatic speech and music analysis and the results obtained for speech emotion analytics and music genre identification with the openSMILE toolkit developed by the author. Further, it is discussed here if and how the aims defined upfront were achieved and open issues for future work are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
According to Google scholar citations.
- 2.
It is to note that, a joint normalisation of training and test set might be best, however, it is not possible, as at training time the test set is not known (or must not be used!) but would be required to normalise the training set.
References
S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, G. Parker, Detecting Depression: A Comparison between Spontaneous and Read Speech. In Proceedings of the ICASSP 2013, Vancouver, Canada, May 2013. IEEE, pp. 7547–7551
S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, G. Parker, From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech. In Proceedings of the FLAIRS Conference, 2012
A. Bhattacharya, W. Wu, Z. Yang, Quality of experience evaluation of voice communication: an affect-based approach. Human-centric Comput. Info. Sci. 2(1), 1–18 (2012). doi:10.1186/2192-1962-2-7
M.P. Black, P.G. Georgiou, A. Katsamanis, B.R. Baucom, S.S. Narayanan. You made me do it: Classification of Blame in Married Couples’ Interactions by Fusing Automatically Derived Speech and Language Information. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 89–92
M.P. Black, A. Katsamanis, B.R. Baucom, C.-C. Lee, A.C. Lammert, A. Christensen, P.G. Georgiou, S.S. Narayanan, Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech Commun. 55(1), 1–21 (2013). doi:10.1016/j.specom.2011.12.003
D. Bone, M. P. Black, M. Li, A. Metallinou, S. Lee, S. Narayanan, Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 3217–3220
D. Bone, M. Li, M.P. Black, S.S. Narayanan, Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Comput. Speech Lang. 28(2), 375–391 (2014). doi:10.1016/j.csl.2012.09.004. ISSN 0885-2308
O. Chapelle, B. Schölkopf, A. Zien, Semi-Supervised Learning (MIT Press, Cambridge, 2006)
J. Deng, B. Schuller, Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. In Proceedings of the INTERSPEECH 2012, Portland, OR, USA, September 2012. ISCA
A. Dhall, R. Goecke, J. Joshi, M. Wagner, T. Gedeon, Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM International conference on multimodal interaction (ICMI) 2013, Sydney, Australia, December 2013. ACM, pp. 509–516
F. Eyben, S. Petridis, B. Schuller, M. Pantic, Audiovisual Vocal Outburst Classification in Noisy Acoustic Conditions. In Proceedings of the ICASSP 2012, Kyoto, Japan, March 2012c. IEEE, pp. 5097–5100
F. Eyben, S. Petridis, B. Schuller, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, Audiovisual Classification of Vocal Outbursts in Human Conversation Using Long-Short-Term Memory Networks. In Proceedings of the ICASSP 2011,Prague, Czech Republic, May 2011. IEEE, pp. 5844–5847
F. Eyben, F. Weninger, F. Gross, B. Schuller. Recent developments in openSMILE, the munich open-source multimedia feature extractor. In Proceedings of the ACM Multimedia 2013, Barcelona, Spain, 2013a, ACM, pp. 835–838
F. Eyben, F. Weninger, M. Woellmer, B. Schuller. openSMILE version 2.0rc1—source code, open-source research only license, http://opensmile.sourceforge.net. 2013b
F. Eyben, M. Wöllmer, B. Schuller, Open Emotion and Affect Recognition (openEAR), http://sourceforge.net/projects/openart/. September 2009a
F. Eyben, M. Wöllmer, B. Schuller. openEAR—Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009), IEEE. vol I, Amsterdam, The Netherlands, pp. 576–581, September 2009b
F. Eyben, M. Wöllmer, B. Schuller. openSMILE—The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the ACM Multimedia 2010, Florence, Italy 2010, ACM, pp. 1459–1462
X. Feng, Y. Zhang, J. Glass, Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In Proceedings of the ICASSP 2014, Florence, Italy, May 2014. IEEE, pp. 1778–1782
A.V. Ivanov, G. Riccardi, A.J. Sporka, J. Franc, Recognition of Personality Traits from Human Spoken Conversations. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 1549–1552
T. Jacykiewicz F. Ringeval, Automatic Recognition of Laughter using Verbal and Non-Verbal Acoustic Features. Master’s thesis, Department of Informatics, University of Fribourg, Switzerland, January 2014
J. H. Jeon, R. Xia, Y. Liu, Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence. In Proceedings of the INTERSPEECH 2010, Makuhari, Japan, 2010. ISCA, pp. 2802–2805
J. Kim, N. Kumar, A. Tsiartas, M. Li, S. Narayanan, Intelligibility classification of pathological speech using fusion of multiple subsystems. In Proceedings of the INTERSPEECH 2012, Portland, OR, USA, 2012. ISCA
C.-C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. In Proceedings of the INTERSPEECH 2009, Brighton, UK, 2009. ISCA, pp. 320–323
C.-C. Lee, A. Katsamanis, M.P. Black, B.R. Baucom, A. Christensen, P.G. Georgiou, S.S. Narayanan, Computing vocal entrainment: A signal-derived pca-based quantification scheme with application to affect analysis in married couple interactions. Comput. Speech Lang. 28(2), 518–539 (2014)
G. Lukacs, M. Jani, G. Takacs, Acoustic feature mining for mixed speech and music playlist generation. In Proceedings of the 55th International Symposium ELMAR 2013, pp. 275–278, Zadar, Croatia, September 2013. IEEE
I. Mporas, T. Ganchev, Estimation of unknown speaker’s height from speech. Intern. J. Speech Technol. 12(4), 149–160 (2009). doi:10.1007/s10772-010-9064-2
A. Muaremi, B. Arnrich, G. Tröster, Towards measuring stress with smartphones and wearable devices during workday and sleep. BioNanoScience, pp. 1–12, 2013. doi:10.1007/s12668-013-0089-2
M. Nicoletti, M. Rudnicki, W. Hemmert, A model of the auditory nerve for acoustic- and electric excitation. Frontiers in Computational Neuroscience (September 2010). doi:10.3389/conf.fncom.2010.51.00104
N. Nikolaou, Music Emotion Classification. Doctoral dissertation, Technical University of Crete, Crete, Greece, 2011. p. 140
O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In Proceedings of the INTERSPEECH 2013, Lyon, France, 2013. ISCA, pp. 210–214
D. Reidsma, K. Truong, H. van Welbergen, D. Neiberg, S. Pammi, I. de Kok, B. van Straalen, Continuous interaction with a virtual human. J. Multimod. User Interfaces (JMUI) 4(2), 97–118 (2011). doi:10.1007/s12193-011-0060-x. ISSN 1783-7677
F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013, Shanghai, China, April 2013. IEEE, pp. 1–8
T. Sainath, B. Kingsbury, B. Ramabhadran, Auto-encoder bottleneck features using deep belief networks. In Proceedings of the ICASSP 2012, pp. 4153–4156, Kyoto, Japan, March 2012. IEEE. doi:10.1109/ICASSP.2012.6288833
B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The INTERSPEECH 2011 Speaker State Challenge. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011a, ISCA, pp. 3201–3204
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 Paralinguistic Challenge. In Proceedings of the INTERSPEECH 2010, Makuhari, Japan, September 2010, ISCA, pp. 2794–2797
B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, Y. Zhang, The INTERSPEECH 2014 computational paralinguistics challenge: Cognitive and physical load. In Proceedings of the INTERSPEECH 2014, Singapore, 2014a. ISCA (to appear)
B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 Emotion Challenge. In Proceedings of the INTERSPEECH 2009, (Brighton, UK, September 2009), pp. 312–315
B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, The INTERSPEECH 2012 Speaker Trait Challenge. In Proceedings of the INTERSPEECH 2012, ISCA, Portland, OR, USA, September 2012a
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al., The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proceedings of the INTERSPEECH 2013, (ISCA, Lyon, France, 2013), pp. 148–152
B. Schuller, M. Valstar, R. Cowie, M. Pantic, AVEC 2012: the continuous audio/visual emotion challenge—an introduction, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, 2012b), pp. 361–362
B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, M. Pantic, AVEC 2011—The First International Audio/Visual Emotion Challenge, in Proceedings of the First International Audio/Visual Emotion Challenge and Workshop, AVEC 2011, held in conjunction with the International HUMAINE Association Conference on Affective Computing and Intelligent Interaction (ACII) 2011, vol II, ed. by B. Schuller, M. Valstar, R. Cowie, M. Pantic (Springer, Memphis, TN, USA, 2011), pp. 415–424
B. Schuller, Y. Zhang, F. Eyben, F. Weninger, Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics. In B. Schuller, P. Buitelaar, L. Devillers, C. Pelachaud, T. Declerck, A. Batliner, P. Rosso, S. Gaines, eds, Proceedings of the 5th International Workshop on Emotion Social Signals, Sentiment & Linked Open Data (ES \(^3\) LOD 2014), satellite of the 9th Language Resources and Evaluation Conference (LREC) 2014, Reykjavik, Iceland, May 2014b. ELRA. p. 8
M. Suzuki, S. Nakagawa, K. Kita, Emotion recognition method based on normalization of prosodic features. In Proceedings of the 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kaohsiung, IEEE, October 2013. doi:10.1109/APSIPA.2013.6694147, pp. 1–5
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic. AVEC 2013—The Continuous Audio/Visual Emotion and Depression Recognition Challenge, In Proceedings of the ACM Multimedia 2013, Barcelona, Spain, October 2013. ACM
F. Weninger, F. Eyben, B.W. Schuller, M. Mortillaro, K.R. Scherer, On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common. Frontiers in Psychology, 4(Article ID 292), pp. 1–12, May 2013b. doi:10.3389/fpsyg.2013.00292
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Eyben, F. (2016). Discussion and Outlook. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-27299-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27298-6
Online ISBN: 978-3-319-27299-3
eBook Packages: EngineeringEngineering (R0)