External Attention LSTM Models for Cognitive Load Classification from Speech

Gallardo-Antolín, Ascensión; Montero, Juan M.

doi:10.1007/978-3-030-31372-2_12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11816))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

746 Accesses
5 Citations

Abstract

Cognitive Load (CL) refers to the amount of mental demand that a given task imposes on an individual’s cognitive system and it can affect his/her productivity in very high load situations. In this paper, we propose an automatic system capable of classifying the CL level of a speaker by analyzing his/her voice. We focus on the use of Long Short-Term Memory (LSTM) networks with different weighted pooling strategies, such as mean-pooling, max-pooling, last-pooling and a logistic regression attention model. In addition, as an alternative to the previous methods, we propose a novel attention mechanism, called external attention model, that uses external cues, such as log-energy and fundamental frequency, for weighting the contribution of each LSTM temporal frame, overcoming the need of a large amount of data for training the attentional model. Experiments show that the LSTM-based system with external attention model outperforms significantly the baseline system based on Support Vector Machines (SVM) and the LSTM-based systems with the conventional weighed pooling schemes and with the logistic regression attention model.

The work leading to these results has been partly supported by Spanish Government grants TEC2017-84395-P and TEC2017-84593-C2-1-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems. Software (2015). tensorflow.org
Boril, H., Sadjadi, O., Kleinschmidt, T., Hansen, J.: Analysis and detection of cognitive load and frustration in drivers speech. In: Proceedings of INTERSPEECH 2010, pp. 502–505 (2010)
Google Scholar
Chollet, F., et al.: Keras: the python deep learning library. Software (2015). https://github.com/fchollet/keras
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Proceedings of NIPS 2015, pp. 577–585 (2015)
Google Scholar
Eyben, F., Huber, B., Marchi, E., Schuller, D., Schuller, B.: Real-time robust recognition of speakers’ emotions and characteristics on mobile platforms. In: Proceedings of ACII 2015, pp. 778–780 (2015)
Google Scholar
Eyben, F., Weninger, F., Gro\(\beta \), F., Schuller, B.: Recent developments in openSMILE, the munich open-source multimedia feature extractor. In: Proceedings of MM 2013, pp. 835–838 (2013)
Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2003)
MathSciNet MATH Google Scholar
van Gog, T., Paas, F.: Cognitive load measurement. In: Seel, N.M. (ed.) Encyclopedia of the Sciences of Learning, pp. 599–601. Springer, Boston (2012). https://doi.org/10.1007/978-1-4419-1428-6
Chapter Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)
Article Google Scholar
Huang, C., Narayanan, S.: Attention assisted discovery of sub-utterance structure in speech emotion recognition. In: Proceedings of INTERSPEECH 2016, pp. 1387–1391 (2016)
Google Scholar
Huang, C., Narayanan, S.: Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In: Proceedings of ICME 2017, pp. 583–588 (2017)
Google Scholar
Huttunen, K., Keränen, H., Väyrynen, E., Pääkkönen, R., Leino, T.: Effect of cognitive load on speech prosody in aviation: evidence from military simulator flights. Appl. Ergon. 42(2), 348–357 (2011)
Article Google Scholar
Kua, J.M.K., Sethu, V., Le, P., Ambikairajah, E.: The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. In: Proceedings of INTERSPEECH 2014, pp. 746–750 (2014)
Google Scholar
Lively, S.E., Pisoni, D.B., Summers, W.V., Bernacki, R.H.: Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences. J. Acoust. Soc. Am. 93(5), 2962–2973 (1993)
Article Google Scholar
Ludeña-Choez, J., Gallardo-Antolín, A.: Feature extraction based on the high-pass filtering of audio signals for acoustic event classification. Comput. Speech Lang. 30(1), 32–42 (2015)
Article Google Scholar
Ludeña-Choez, J., Gallardo-Antolín, A.: Acoustic event classification using spectral band selection and non-negative matrix factorization-based features. Expert. Syst. Appl. 46(1), 77–86 (2016)
Article Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of SCIPY 2015, pp. 18–25 (2015)
Google Scholar
Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: Proceedings of ICASSP 2017, pp. 2227–2231 (2017)
Google Scholar
Müller, C., Großmann-Hutter, B., Jameson, A., Rummer, R., Wittig, F.: Recognizing time pressure and cognitive load on the basis of speech: an experimental study. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS (LNAI), vol. 2109, pp. 24–33. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44566-8_3
Chapter Google Scholar
Qian, Y., Bi, M., Tan, T., Yu, K.: Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)
Article Google Scholar
Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: Proceedings of ICASSP 2015, pp. 4225–4229 (2015)
Google Scholar
Schuller, B., et al.: The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In: Proceedings of INTERSPEECH 2014 (2014)
Google Scholar
van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M.P., Potamianos, A., Narayanan, S.S.: Classification of cognitive load from speech using an i-vector framework. In: Proceedings of INTERSPEECH 2014, pp. 751–755 (2014)
Google Scholar
Stroop, J.R.: Studies of interference in serial verbal reactions. J. Exp. Psychol. 18(6), 643 (1935)
Article Google Scholar
Yap, T.F.: Speech production under cognitive load: effects and classification. Ph.D. dissertation, The University of New South Wales, Sydney, Australia (2012)
Google Scholar
Zazo, R., Lozano-Díez, A., González-Domínguez, J., Toledano, D.T., González-Rodríguez, J.: Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE 11(1), e0146917 (2016)
Article Google Scholar

Download references

Acknowledgments

We would like to thank Prof. J. Epps for kindly providing the CSLE dataset and Prof. B. Schuller and the rest of the ComParE 2014 organizers for kindly providing the dataset partition and the baseline system.

Author information

Authors and Affiliations

Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, 28911, Leganés, Madrid, Spain
Ascensión Gallardo-Antolín
Speech Technology Group, ETSIT, Universidad Politécnica de Madrid, Avda. de la Complutense, 30, 28040, Madrid, Spain
Juan M. Montero

Authors

Ascensión Gallardo-Antolín
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Montero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ascensión Gallardo-Antolín .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Queen Mary University of London, London, UK
Matthew Purver
Jožef Stefan Institute, Ljubljana, Slovenia
Senja Pollak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gallardo-Antolín, A., Montero, J.M. (2019). External Attention LSTM Models for Cognitive Load Classification from Speech. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-31372-2_12
Published: 27 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31371-5
Online ISBN: 978-3-030-31372-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics