Skip to main content

External Attention LSTM Models for Cognitive Load Classification from Speech

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2019)

Abstract

Cognitive Load (CL) refers to the amount of mental demand that a given task imposes on an individual’s cognitive system and it can affect his/her productivity in very high load situations. In this paper, we propose an automatic system capable of classifying the CL level of a speaker by analyzing his/her voice. We focus on the use of Long Short-Term Memory (LSTM) networks with different weighted pooling strategies, such as mean-pooling, max-pooling, last-pooling and a logistic regression attention model. In addition, as an alternative to the previous methods, we propose a novel attention mechanism, called external attention model, that uses external cues, such as log-energy and fundamental frequency, for weighting the contribution of each LSTM temporal frame, overcoming the need of a large amount of data for training the attentional model. Experiments show that the LSTM-based system with external attention model outperforms significantly the baseline system based on Support Vector Machines (SVM) and the LSTM-based systems with the conventional weighed pooling schemes and with the logistic regression attention model.

The work leading to these results has been partly supported by Spanish Government grants TEC2017-84395-P and TEC2017-84593-C2-1-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems. Software (2015). tensorflow.org

  2. Boril, H., Sadjadi, O., Kleinschmidt, T., Hansen, J.: Analysis and detection of cognitive load and frustration in drivers speech. In: Proceedings of INTERSPEECH 2010, pp. 502–505 (2010)

    Google Scholar 

  3. Chollet, F., et al.: Keras: the python deep learning library. Software (2015). https://github.com/fchollet/keras

  4. Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Proceedings of NIPS 2015, pp. 577–585 (2015)

    Google Scholar 

  5. Eyben, F., Huber, B., Marchi, E., Schuller, D., Schuller, B.: Real-time robust recognition of speakers’ emotions and characteristics on mobile platforms. In: Proceedings of ACII 2015, pp. 778–780 (2015)

    Google Scholar 

  6. Eyben, F., Weninger, F., Gro\(\beta \), F., Schuller, B.: Recent developments in openSMILE, the munich open-source multimedia feature extractor. In: Proceedings of MM 2013, pp. 835–838 (2013)

    Google Scholar 

  7. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2003)

    MathSciNet  MATH  Google Scholar 

  8. van Gog, T., Paas, F.: Cognitive load measurement. In: Seel, N.M. (ed.) Encyclopedia of the Sciences of Learning, pp. 599–601. Springer, Boston (2012). https://doi.org/10.1007/978-1-4419-1428-6

    Chapter  Google Scholar 

  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)

    Article  Google Scholar 

  10. Huang, C., Narayanan, S.: Attention assisted discovery of sub-utterance structure in speech emotion recognition. In: Proceedings of INTERSPEECH 2016, pp. 1387–1391 (2016)

    Google Scholar 

  11. Huang, C., Narayanan, S.: Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In: Proceedings of ICME 2017, pp. 583–588 (2017)

    Google Scholar 

  12. Huttunen, K., Keränen, H., Väyrynen, E., Pääkkönen, R., Leino, T.: Effect of cognitive load on speech prosody in aviation: evidence from military simulator flights. Appl. Ergon. 42(2), 348–357 (2011)

    Article  Google Scholar 

  13. Kua, J.M.K., Sethu, V., Le, P., Ambikairajah, E.: The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. In: Proceedings of INTERSPEECH 2014, pp. 746–750 (2014)

    Google Scholar 

  14. Lively, S.E., Pisoni, D.B., Summers, W.V., Bernacki, R.H.: Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences. J. Acoust. Soc. Am. 93(5), 2962–2973 (1993)

    Article  Google Scholar 

  15. Ludeña-Choez, J., Gallardo-Antolín, A.: Feature extraction based on the high-pass filtering of audio signals for acoustic event classification. Comput. Speech Lang. 30(1), 32–42 (2015)

    Article  Google Scholar 

  16. Ludeña-Choez, J., Gallardo-Antolín, A.: Acoustic event classification using spectral band selection and non-negative matrix factorization-based features. Expert. Syst. Appl. 46(1), 77–86 (2016)

    Article  Google Scholar 

  17. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  18. McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of SCIPY 2015, pp. 18–25 (2015)

    Google Scholar 

  19. Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: Proceedings of ICASSP 2017, pp. 2227–2231 (2017)

    Google Scholar 

  20. Müller, C., Großmann-Hutter, B., Jameson, A., Rummer, R., Wittig, F.: Recognizing time pressure and cognitive load on the basis of speech: an experimental study. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS (LNAI), vol. 2109, pp. 24–33. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44566-8_3

    Chapter  Google Scholar 

  21. Qian, Y., Bi, M., Tan, T., Yu, K.: Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)

    Article  Google Scholar 

  22. Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: Proceedings of ICASSP 2015, pp. 4225–4229 (2015)

    Google Scholar 

  23. Schuller, B., et al.: The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In: Proceedings of INTERSPEECH 2014 (2014)

    Google Scholar 

  24. van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M.P., Potamianos, A., Narayanan, S.S.: Classification of cognitive load from speech using an i-vector framework. In: Proceedings of INTERSPEECH 2014, pp. 751–755 (2014)

    Google Scholar 

  25. Stroop, J.R.: Studies of interference in serial verbal reactions. J. Exp. Psychol. 18(6), 643 (1935)

    Article  Google Scholar 

  26. Yap, T.F.: Speech production under cognitive load: effects and classification. Ph.D. dissertation, The University of New South Wales, Sydney, Australia (2012)

    Google Scholar 

  27. Zazo, R., Lozano-Díez, A., González-Domínguez, J., Toledano, D.T., González-Rodríguez, J.: Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE 11(1), e0146917 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Prof. J. Epps for kindly providing the CSLE dataset and Prof. B. Schuller and the rest of the ComParE 2014 organizers for kindly providing the dataset partition and the baseline system.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ascensión Gallardo-Antolín .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gallardo-Antolín, A., Montero, J.M. (2019). External Attention LSTM Models for Cognitive Load Classification from Speech. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31372-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31371-5

  • Online ISBN: 978-3-030-31372-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics