Spoken Language Understanding

McTear, Michael; Callejas, Zoraida; Griol, David

doi:10.1007/978-3-319-32967-3_8

Michael McTear⁴,
Zoraida Callejas⁵ &
David Griol⁶

6684 Accesses
2 Citations

Abstract

Spoken language understanding (SLU) involves taking the output of the speech recognition component and producing a representation of its meaning that can be used by the dialog manager (DM) to decide what to do next in the interaction. As systems have become more conversational, allowing the user to express their commands and queries in a more natural way, SLU has become a hot topic for the next generation of conversational interfaces. SLU embraces a wide range of technologies that can be used for various tasks involving the processing of text. In this chapter, we provide an overview of these technologies, focusing in particular on those that are relevant to the conversational interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://research.microsoft.com/en-us/events/dstc/. Accessed February 20, 2016.
2.
http://www.colips.org/workshop/dstc4/. Accessed February 20, 2016.
3.
https://api.ai/. Accessed February 20, 2016.
4.
https://wit.ai. Accessed February 20, 2016.
5.
https://www.luis.ai/. Accessed February 20, 2016.
6.
http://nlp.stanford.edu:8080/corenlp/process. Accessed February 20, 2016.
7.
http://nlp.stanford.edu:8080/corenlp/process. Accessed February 20, 2016.
8.
http://nlp.stanford.edu/fsnlp/. Accessed February 20, 2016.
9.
http://web.stanford.edu/class/cs224n/. Accessed February 20, 2016.
10.
http://cs224d.stanford.edu/. Accessed February 20, 2016.
11.
http://u.cs.biu.ac.il/~yogo/nnlp.pdf. Accessed February 20, 2016.
12.
https://www.youtube.com/playlist?list=PLfmUaIBTH8exY7fZnJss508Bp8k1R8ASG. Accessed February 20, 2016.
13.
https://www.aclweb.org/. Accessed February 20, 2016.
14.
http://www.interspeech2016.org/. Accessed February 20, 2016.
15.
http://lrec-conf.org/. Accessed February 20, 2016.
16.
http://nlp.shef.ac.uk/iccl/. Accessed February 20, 2016.
17.
http://ifarm.nl/signll/conll/. Accessed February 20, 2016.
18.
http://naacl.org/. Accessed February 20, 2016.
19.
http://www.eacl.org/. Accessed February 20, 2016.
20.
http://www.speechtek.com/. Accessed February 20, 2016.
21.
http://www.mobilevoiceconference.com/. Accessed February 20, 2016.

References

Allen JF (1995) Natural language understanding, 2nd edn. Benjamin Cummings Publishing Company Inc., Redwood
MATH Google Scholar
Allen JF, Core M (1997) Draft of DAMSL: dialog act markup in several layers. The Multiparty Discourse Group, University of Rochester, Rochester. http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/. Accessed 20 Jan 2016
Béchet F, Nasr A (2009) Robust dependency parsing for spoken language understanding of spontaneous speech. In: Proceedings of the 10th annual conference of the international speech communication association (Interspeech2009), Brighton, UK, 6–10 Sept 2009, pp 1027–1030. http://www.isca-speech.org/archive/archive_papers/interspeech_2009/papers/i09_1039.pdf. Accessed 21 Jan 2016
Béchet F, Nasr A, Favre B (2014) Adapting dependency parsing to spontaneous speech for open domain language understanding. In: Proceedings of the 15th annual conference of the international speech communication association (Interspeech2014), Singapore, 14–18 Sept 2014, pp 135–139. http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_0135.pdf. Accessed 21 Jan 2016
Bender O, Macherey K, Och F-J, Ney H (2003) Comparison of alignment templates and maximum entropy models for natural language understanding. In: Proceedings of the 10th conference of the European chapter of the association for computational linguistics, Budapest, Hungary, 12–17 Apr 2003, pp 11–18. doi:10.3115/1067807.1067811
Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology (HLT-NAACL 2003), Edmonton, Canada, 27 May–1 June 2003, pp 4–6. doi:10.3115/1073483.1073485
Black AW, Burger S, Conkie A, Hastie H, Keizer S, Lemon O, Merigaud N, Parent G, Schubiner G, Thomson B, Williams JD, Yu K, Young S, Eskenazi M (2011) Spoken dialogue challenge 2010: comparison of live and control test results. In: Chai JY, Moore JD, Passonneau RJ, Traum DR (eds) Proceedings of the SIGDial 2011 conference, Portland, Oregon, June 2011. http://www.aclweb.org/anthology/W/W11/W11-2002.pdf. Accessed 23 Jan 2016
Bowman SR, Potts C, Manning CD (2015) Recursive neural networks can learn logical semantics. In: Proceedings of the 3rd workshop on continuous vector space models and their compositionality (CVSC), Beijing, China, 26–31 July 2015, pp 12–21. doi:10.18653/v1/w15-4002
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–3537. http://arxiv.org/pdf/1103.0398.pdf
Dahl DA (2013) Natural language processing: past, present and future. In: Neustein A, Markowitz JA (eds), Mobile speech and advanced natural language solutions. Springer Science+Business Media, New York, pp 49–73. doi:10.1007/978-1-4614-6018-3_4
Google Scholar
Dahl DA, Bates M, Brown M, Fisher W, Hunicke-Smith K, Pallett D, Pao C, Rudnicky A, Shriberg E (1994) Expanding the scope of the ATIS talk: the ATIS-3 corpus. In: Proceedings of the workshop on human language technology (HLT’94), Association for computational linguistics, Stroudsburg, pp 43–48. doi:10.3115/1075812.1075823
Dinarelli M (2010) Spoken language understanding: from spoken utterances to semantic structures. Dissertation, University of Trento, 2010. http://eprints-phd.biblio.unitn.it/280/
Dinarelli M, Quarteroni S, Tonelli S, Moschitti A, Riccardi G (2009) Annotating spoken dialogs: from speech segments to dialog acts and frame semantics. In: Proceedings of SRSL 2009, the 2nd workshop on semantic representation of spoken language, Association for computational linguistics, Athens, Greece, March, pp 34–41. doi:10.3115/1626296.1626301
Gildea D, Jurafsky D (2002) Automatic labeling of semantic roles. Comp Linguist 28(3):245–288. doi:10.1162/089120102760275983
Article Google Scholar
Godfrey JJ, Holliman EC, McDaniel J (1992) Switchboard: telephone speech corpus for research and development. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP-92), vol 1. San Francisco, 23–26 March, pp 517–520. doi:10.1109/icassp.1992.225858
Hahn S, Dinarelli M, Raymond C, Lefevre F, Lehnen P. De Mori R, Moschitti A, Ney H, Riccardi G (2011) Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans Speech Audio Proc 19(6):1569–1583. doi:10.1109/tasl.2010.2093520
Google Scholar
Hakkani-Tür D, Béchet F, Riccardi G, Tur G (2006) Beyond ASDR 1-best: using word confusion networks in spoken language understanding. Comp Speech Lang 20(4):495–514. doi:10.1016/j.csl.2005.07.005
Article Google Scholar
He Y, Young S (2006) Spoken language understanding using the hidden vector state model. Speech Commun 48(3–4):262–275. doi:10.1016/j.specom.2005.06.002
Article Google Scholar
Henderson J, Jurčíček F (2012) Data-driven methods for spoken language understanding. In: Lemon O, Pietquin O (eds) Data-driven methods for adaptive spoken dialogue systems: computational learning for conversational interfaces. Springer, New York, pp 19–38. doi:10.1007/978-1-4614-4803-7_3
Google Scholar
Huang X, Acero A, Hon H-W (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall, Upper Saddle River
Google Scholar
Hunt A, McGlashan S (2004) Speech recognition grammar specification version 1.0. http://www.w3.org/TR/speech-grammar/. Accessed 21 Jan 2016
Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Prentice Hall, Upper Saddle River
Google Scholar
Keizer S, op den Akker R, Nijholt A (2002) Dialogue act recognition with Bayesian networks for Dutch dialogues. In: Proceedings of the 3rd SIGdial workshop on discourse and dialogue, Philadelphia, PA, pp 88–94. doi: 10.3115/1118121.1118134
Klüwer T, Uszkoreit H, Xu F (2010) Using syntactic and semantic based relations for dialog act recognition. In: Proceedings of the 23rd international conference on computational linguistics (COLING’10), Association for computational linguistics, Stroudsburg, pp 570–578. http://www.aclweb.org/anthology/C10-2065.pdf. Accessed 21 Jan 2016
Kübler S, McDonald R, Nivre J (2009) Dependency parsing. Synthesis lectures on human language technologies. Morgan and Claypool Publishers, San Rafael. doi:10.2200/S00169ED1V01Y200901HLT002
Google Scholar
Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Socher R (2015) Ask me anything: dynamic memory networks for natural language processing. arXiv: http://arxiv.org/abs/1506.07285. Accessed 21 Jan 2016
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML’01), Williamstown, MA, USA, 28 June–1 July 2001, pp 282–289. http://dl.acm.org/citation.cfm?id=655813
Lefèvre F (2006) A DBN-based multi-level stochastic spoken language understanding system. In IEEE spoken language technology workshop, Palm Beach, Aruba, 10–13 Dec 2006, pp 82–85. doi:10.1109/slt.2006.326822
Lefèvre F (2007) Dynamic bayesian networks and discriminative classifiers for multistage semantic interpretation. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP’07), vol 4. Honolulu, HI, USA, 15–20 Apr 2007, pp 13–16. doi:10.1109/ICASSP.2007.367151
Lemon O, Pietquin O (eds) (2012) Data-driven methods for adaptive spoken dialogue systems: computational learning for conversational interfaces. Springer, New York. doi:10.1007/978-1-4614-4803-7
Google Scholar
Macherey K, Bender O, Ney H (2009) Applications of statistical machine translation approaches to spoken language understanding. IEEE Trans Speech Audio Proc 17(4):803–818. doi:10.1109/tasl.2009.2014262
Google Scholar
Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2009) Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP’09), Taipei, Taiwan, 19–24 Apr 2009, pp 4749–4752. doi:10.1109/icassp.2009.4960692
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
MATH Google Scholar
Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D, Zweig G (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Speech Audio Proc 23(3):530–539. doi:10.1109/taslp.2014.2383614
Google Scholar
Mikolov T, Chen K, Corrado GS, Dean J (2013a) Efficient representation of word representations in vector space. In: Proceedings of the international workshop on learning representations (ICLR) 2013, Scottsdale, AZ, USA, 2–4 May 2013. http://arxiv.org/pdf/1301.3781.pdf. Accessed 21 Jan 2016
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Proceedings of the twenty-seventh conference on neural information processing systems 26 (NIPS 2013), Lake Tahoe, 5–10 Dec 2013. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. Accessed 21 Jan 2016
Nagata M, Morimoto T (1994) First steps toward statistical modeling of dialogue to predict the speech act type of the next utterance. Speech Commun 15:193–203. doi:10.1016/0167-6393(94)90071-x
Article Google Scholar
Raymond C, Riccardi G (2007) Generative and discriminative algorithms for spoken language understanding. In: Proceedings of the 8th annual conference of the international speech communication association (Interspeech 2007), Antwerp, Belgium, 27–31 Aug, pp 1605–1608. http://www.isca-speech.org/archive/archive_papers/interspeech_2007/i07_1605.pdf. Accessed 21 Jan 2016
Reese RM (2015) Natural language processing with Java. Packt Publishing Ltd., Birmingham
Google Scholar
Shriberg E, Bates R, Stolcke A, Taylor P, Jurafsky D, Ries K, Coccaro N, Martin R, Meteer M, Ess-Dykema CV (1998) Can prosody aid the automatic classification of dialog acts in conversational speech? Lang Speech 41(3–4):439–487. http://www.ncbi.nlm.nih.gov/pubmed/10746366. Accessed 21 Jan 2016
Google Scholar
Socher R, Bauer J, Manning CD, Ng AY (2013a) Parsing with compositional vector grammars. In: Proceedings of the 51st meeting of the association for computational linguistics (ACL) 2013, Sofia, Bulgaria, 4–9 Aug. http://www.aclweb.org/anthology/P/P13/P13-1045.pdf. Accessed 21 Jan 2016
Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013b) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP 2013), Seattle, Washington, USA, 18–21 Oct 2013, pp 1631–1642. http://www.aclweb.org/anthology/D/D13/D13-1170.pdf. Accessed 21 Jan 2016
Stolcke A, Ries K, Coccaro N, Shriberg E, Bates R, Jurafsky D, Taylor P, Martin R, Meteer M, Van Dykema C (2000) Dialogue act modelling for automatic tagging and recognition of conversational speech. Comp Linguist 26(3):339–371. doi:10.1162/089120100561737
Article Google Scholar
Suzanne J, Klein A, Maier E, Maleck I, Mast M, Quantz J (1995) Dialogue acts in Verbmobil. Report 65, University of Hamburg, DFKI GmbH, University of Erlangen, TU Berlin
Google Scholar
Taylor A, Marcus M, Santorini B (2003) The penn treebank: an overview. In: Abeillé A (ed) Treebanks: building and using parsed corpora. Kluwer Academic Publishers, Dordrecht, pp 5–22. doi:10.1007/978-94-010-0201-1_1
Google Scholar
Tur G, de Mori R (eds) (2011) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester. doi:10.1002/9781119992691
Google Scholar
Tur G, Deng L (2011) Intent determination and spoken utterance classification. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester, pp 93–118. doi:10.1002/9781119992691.ch4
Google Scholar
Tur G, Hakkani-Tür D (2011) Human/human conversation understanding. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester, pp 225–255. doi:10.1002/9781119992691.ch9
Google Scholar
Van Tichelen L, Burke D (2007) Semantic interpretation for speech recognition (SISR) version 1.0. http://www.w3.org/TR/semantic-interpretation/. Accessed 21 Jan 2016
Vapnik VN (1998) Statistical learning theory. Wiley, Chichester
MATH Google Scholar
Walker MA, Rudnicky A, Prasad R, Aberdeen J, Bratt EO, Garofolo J, Hastie H, Le A, Pellom B, Potamianos A, Passonneau R, Roukos S, Sanders G, Seneff S, Stallard D (2002) DARPA communicator: cross-system results for the 2001 evaluation. In: Proceedings of the 7th international conference on spoken language processing (ICSLP2002), vol 1. Denver, Colorado, pp 273–276. http://www.isca-speech.org/archive/archive_papers/icslp_2002/i02_0269.pdf. Accessed 21 Jan 2016
Wang YY, Deng L, Acero A (2011) Semantic frame-based spoken language understanding. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester, pp 41–91. doi:10.1002/9781119992691.ch3
Google Scholar
Ward W (1991) Understanding spontaneous speech: the Phoenix system. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP-91), Toronto, Canada, 14–17 Apr, pp 365–367. doi:10.1109/icassp.1991.150352
Webb N, Hepple M, Wilks Y (2005) Dialogue act classification using intra-utterance features. In: Proceedings of the AAAI workshop on spoken language understanding, Pittsburgh, PA, pp 451–458. http://staffwww.dcs.shef.ac.uk/people/Y.Wilks/papers/AAAI05_A.pdf. Accessed 21 Jan 2016
Williams JD (2012) A belief tracking challenge task for spoken dialog systems. In: NAACL-HLT Workshop on future directions and needs in the spoken dialog community: tools and data. NAACL 2012, Montreal, 7 June, 2012, 23–24. http://www.aclweb.org/anthology/W12-1812. Accessed 23 Jan 2016
Williams JD, Kamal E, Ashour M, Amr H, Miller J, Zweig G (2015a) Fast and easy language understanding for dialog systems with Microsoft Language Understanding Intelligent Service (LUIS). In: Proceedings of the SIGDIAL 2015 conference, Prague, Czech Republic, 2–4 Sept 2015, pp 159–161. doi:10.18653/v1/w15-4622
Williams JD, Niraula NB, Dasigi P, Lakshmiratan A, Suarez CGJ, Reddy M, Zweig G (2015b) Rapidly scaling dialog systems with interactive learning. In: Lee GG, Kim HK, Jeong M, Kim J-H (eds) Natural language dialog systems and intelligent assistants. Springer, New York, pp 1–12. doi:10.1007/978-3-319-19291-8_1
Google Scholar
Wu W-L, Lu R-Z, Duan J-Y, Liu H, Gao F, Chen Y-Q (2010) Spoken language understanding using weakly supervised learning. Comp Speech Lang 24(2):358–382. doi:10.1016/j.csl.2009.05.002
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Ulster University, Northern Ireland, UK
Michael McTear
ETSI Informática y Telecomunicación, University of Granada, Granada, Spain
Zoraida Callejas
Department of Computer Science, Universidad Carlos III de Madrid, Madrid, Spain
David Griol

Authors

Michael McTear
View author publications
You can also search for this author in PubMed Google Scholar
Zoraida Callejas
View author publications
You can also search for this author in PubMed Google Scholar
David Griol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael McTear .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McTear, M., Callejas, Z., Griol, D. (2016). Spoken Language Understanding. In: The Conversational Interface. Springer, Cham. https://doi.org/10.1007/978-3-319-32967-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-32967-3_8
Published: 20 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32965-9
Online ISBN: 978-3-319-32967-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics