A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis

Blanchard, Nathaniel; Brady, Michael; Olney, Andrew M.; Glaus, Marci; Sun, Xiaoyi; Nystrand, Martin; Samei, Borhan; Kelly, Sean; D’Mello, Sidney

doi:10.1007/978-3-319-19773-9_3

Nathaniel Blanchard⁸,
Michael Brady⁸,
Andrew M. Olney⁹,
Marci Glaus¹⁰,
Xiaoyi Sun¹⁰,
Martin Nystrand¹⁰,
Borhan Samei⁹,
Sean Kelly¹¹ &
…
Sidney D’Mello⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9112))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

6265 Accesses
11 Citations

Abstract

The development of large-scale automatic classroom dialog analysis systems requires accurate speech-to-text translation. A variety of automatic speech recognition (ASR) engines were evaluated for this purpose. Recordings of teachers in noisy classrooms were used for testing. In comparing ASR results, Google Speech and Bing Speech were more accurate with word accuracy scores of 0.56 for Google and 0.52 for Bing compared to 0.41 for AT&T Watson, 0.08 for Microsoft, 0.14 for Sphinx with the HUB4 model, and 0.00 for Sphinx with the WSJ model. Further analysis revealed both Google and Bing engines were largely unaffected by speakers, speech class sessions, and speech characteristics. Bing results were validated across speakers in a laboratory study, and a method of improving Bing results is presented. Results provide a useful understanding of the capabilities of contemporary ASR engines in noisy classroom environments. Results also highlight a list of issues to be aware of when selecting an ASR engine for difficult speech recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kelly, S.: Classroom discourse and the distribution of student engagement. Soc. Psychol. Educ. 10, 331–352 (2007)
Article Google Scholar
Sweigart, W.: Classroom Talk, Knowledge Development, and Writing. Res. Teach. Engl. 25, 469–496 (1991)
Google Scholar
Juzwik, M.M., Borsheim-Black, C., Caughlan, S., Heintz, A.: Inspiring Dialogue: Talking to Learn in the English Classroom. Teachers College Press (2013)
Google Scholar
Nystrand, M., Gamoran, A., Kachur, R., Prendergast, C.: Opening dialogue. Teachers College, Columbia University, New York (1997)
Google Scholar
Gamoran, A., Kelly, S.: Tracking, instruction, and unequal literacy in secondary school English. In: Stab. Change Am. Educ. Struct. Process Outcomes, pp. 109–126 (2003)
Google Scholar
Nystrand, M., Gamoran, A.: The big picture: Language and learning in hundreds of English lessons. In: Open. Dialogue., pp. 30–74 (1997)
Google Scholar
Wang, Z., Pan, X., Miller, K.F., Cortina, K.S.: Automatic classification of activities in classroom discourse. Comput. Educ. 78, 115–123 (2014)
Article Google Scholar
Ford, M., Baer, C.T., Xu, D., Yapanel, U., Gray, S.: The LENA Language Environment Analysis System. Technical Report LTR-03-2. Boulder, CO: LENA Foundation (2008)
Google Scholar
Litman, D.J., Silliman, S.: ITSPOKE: An intelligent tutoring spoken dialogue system. In: Demonstration Papers at HLT-NAACL 2004, Association for Computational Linguistics, pp. 5–8 (2004)
Google Scholar
Mostow, J., Aist, G.: Evaluating tutors that listen: An overview of Project LISTEN (2001)
Google Scholar
Schultz, K., Bratt, E.O., Clark, B., Peters, S., Pon-Barry, H., Treeratpituk, P.: A scalable, reusable spoken conversational tutor: Scot. In: Proceedings of the AIED 2003 Workshop on Tutorial Dialogue Systems: With a View toward the Classroom, pp. 367–377 (2003)
Google Scholar
Ward, W., Cole, R., Bolaños, D., Buchenroth-Martin, C., Svirsky, E., Vuuren, S.V., Weston, T., Zheng, J., Becker, L.: My science tutor: A conversational multimedia virtual tutor for elementary school science. ACM Trans. Speech Lang. Process. TSLP. 7, 18 (2011)
Google Scholar
Johnson, W.L., Valente, A.: Tactical Language and Culture Training Systems: using AI to teach foreign languages and cultures. AI Mag. 30, 72 (2009)
Google Scholar
Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., Traum, D.: Which ASR should I choose for my dialogue system? In: Proceedings of the SIGDIAL 2013 Conference, Metz, pp. 394–403 (2013)
Google Scholar
Samei, B., Olney, A., Kelly, S., Nystrand, M., D’Mello, S., Blanchard, N., Sun, X., Glaus, M., Graesser, A.: Domain independent assessment of dialogic properties of classroom discourse. In: Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B.M., (Eds.) Proceedings of the 7th International Conference on Educational Data Mining, London, pp. 233–236 (2014)
Google Scholar
Nystrand, M., Wu, L.L., Gamoran, A., Zeiser, S., Long, D.A.: Questions in time: Investigating the structure and dynamics of unfolding classroom discourse. Discourse Process. 35, 135–198 (2003)
Article Google Scholar
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: Your word is my command: google search by voice: a case study. In: Advances in Speech Recognition, pp. 61–90. Springer (2010)
Google Scholar
Microsoft: The Bing Speech Recognition Control (2014). http://www.bing.com/dev/en-us/speech. (accessed January 14, 2015)
Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M.G., Riccardi, G., Saraclar, M.: The AT&T WATSON speech recognizer. In: ICASSP (1), pp. 1033–1036 (2005)
Google Scholar
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., Woelfel, J.: Sphinx-4: A flexible open source framework for speech recognition (2004)
Google Scholar
Kelly, S., Majerus, R.: School-to-school variation in disciplined inquiry. Urban Educ. 0042085911413151 (2011)
Google Scholar
D’Mello, S.K., Graesser, A., King, B.: Toward Spoken Human-Computer Tutorial Dialogues. Human-Computer Interact. 25, 289–323 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Notre Dame, Notre Dame, USA
Nathaniel Blanchard, Michael Brady & Sidney D’Mello
University of Memphis, Memphis, USA
Andrew M. Olney & Borhan Samei
University of Wisconsin-Madison, Madison, USA
Marci Glaus, Xiaoyi Sun & Martin Nystrand
University of Pittsburgh, Pittsburgh, USA
Sean Kelly

Authors

Nathaniel Blanchard
View author publications
You can also search for this author in PubMed Google Scholar
Michael Brady
View author publications
You can also search for this author in PubMed Google Scholar
Andrew M. Olney
View author publications
You can also search for this author in PubMed Google Scholar
Marci Glaus
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Martin Nystrand
View author publications
You can also search for this author in PubMed Google Scholar
Borhan Samei
View author publications
You can also search for this author in PubMed Google Scholar
Sean Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Sidney D’Mello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathaniel Blanchard .

Editor information

Editors and Affiliations

University of British Columbia, Vancouver, British Columbia, Canada
Cristina Conati
Computer Science Department, Worcester Polytechnic Institute, Worcester, Massachusetts, USA
Neil Heffernan
Department of Computer Science and Software Engineering, University of Canterbury, Christchurch, New Zealand
Antonija Mitrovic
E.T.S.I. Informática, Universidad National de Educacion a Distancia, Madrid, Spain
M. Felisa Verdejo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blanchard, N. et al. (2015). A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M. (eds) Artificial Intelligence in Education. AIED 2015. Lecture Notes in Computer Science(), vol 9112. Springer, Cham. https://doi.org/10.1007/978-3-319-19773-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-19773-9_3
Published: 17 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19772-2
Online ISBN: 978-3-319-19773-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics