Skip to main content

Spoken Language Understanding

  • Chapter
  • First Online:
The Conversational Interface

Abstract

Spoken language understanding (SLU) involves taking the output of the speech recognition component and producing a representation of its meaning that can be used by the dialog manager (DM) to decide what to do next in the interaction. As systems have become more conversational, allowing the user to express their commands and queries in a more natural way, SLU has become a hot topic for the next generation of conversational interfaces. SLU embraces a wide range of technologies that can be used for various tasks involving the processing of text. In this chapter, we provide an overview of these technologies, focusing in particular on those that are relevant to the conversational interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://research.microsoft.com/en-us/events/dstc/. Accessed February 20, 2016.

  2. 2.

    http://www.colips.org/workshop/dstc4/. Accessed February 20, 2016.

  3. 3.

    https://api.ai/. Accessed February 20, 2016.

  4. 4.

    https://wit.ai. Accessed February 20, 2016.

  5. 5.

    https://www.luis.ai/. Accessed February 20, 2016.

  6. 6.

    http://nlp.stanford.edu:8080/corenlp/process. Accessed February 20, 2016.

  7. 7.

    http://nlp.stanford.edu:8080/corenlp/process. Accessed February 20, 2016.

  8. 8.

    http://nlp.stanford.edu/fsnlp/. Accessed February 20, 2016.

  9. 9.

    http://web.stanford.edu/class/cs224n/. Accessed February 20, 2016.

  10. 10.

    http://cs224d.stanford.edu/. Accessed February 20, 2016.

  11. 11.

    http://u.cs.biu.ac.il/~yogo/nnlp.pdf. Accessed February 20, 2016.

  12. 12.

    https://www.youtube.com/playlist?list=PLfmUaIBTH8exY7fZnJss508Bp8k1R8ASG. Accessed February 20, 2016.

  13. 13.

    https://www.aclweb.org/. Accessed February 20, 2016.

  14. 14.

    http://www.interspeech2016.org/. Accessed February 20, 2016.

  15. 15.

    http://lrec-conf.org/. Accessed February 20, 2016.

  16. 16.

    http://nlp.shef.ac.uk/iccl/. Accessed February 20, 2016.

  17. 17.

    http://ifarm.nl/signll/conll/. Accessed February 20, 2016.

  18. 18.

    http://naacl.org/. Accessed February 20, 2016.

  19. 19.

    http://www.eacl.org/. Accessed February 20, 2016.

  20. 20.

    http://www.speechtek.com/. Accessed February 20, 2016.

  21. 21.

    http://www.mobilevoiceconference.com/. Accessed February 20, 2016.

References

  • Allen JF (1995) Natural language understanding, 2nd edn. Benjamin Cummings Publishing Company Inc., Redwood

    MATH  Google Scholar 

  • Allen JF, Core M (1997) Draft of DAMSL: dialog act markup in several layers. The Multiparty Discourse Group, University of Rochester, Rochester. http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/. Accessed 20 Jan 2016

  • Béchet F, Nasr A (2009) Robust dependency parsing for spoken language understanding of spontaneous speech. In: Proceedings of the 10th annual conference of the international speech communication association (Interspeech2009), Brighton, UK, 6–10 Sept 2009, pp 1027–1030. http://www.isca-speech.org/archive/archive_papers/interspeech_2009/papers/i09_1039.pdf. Accessed 21 Jan 2016

  • Béchet F, Nasr A, Favre B (2014) Adapting dependency parsing to spontaneous speech for open domain language understanding. In: Proceedings of the 15th annual conference of the international speech communication association (Interspeech2014), Singapore, 14–18 Sept 2014, pp 135–139. http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_0135.pdf. Accessed 21 Jan 2016

  • Bender O, Macherey K, Och F-J, Ney H (2003) Comparison of alignment templates and maximum entropy models for natural language understanding. In: Proceedings of the 10th conference of the European chapter of the association for computational linguistics, Budapest, Hungary, 12–17 Apr 2003, pp 11–18. doi:10.3115/1067807.1067811

  • Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology (HLT-NAACL 2003), Edmonton, Canada, 27 May–1 June 2003, pp 4–6. doi:10.3115/1073483.1073485

  • Black AW, Burger S, Conkie A, Hastie H, Keizer S, Lemon O, Merigaud N, Parent G, Schubiner G, Thomson B, Williams JD, Yu K, Young S, Eskenazi M (2011) Spoken dialogue challenge 2010: comparison of live and control test results. In: Chai JY, Moore JD, Passonneau RJ, Traum DR (eds) Proceedings of the SIGDial 2011 conference, Portland, Oregon, June 2011. http://www.aclweb.org/anthology/W/W11/W11-2002.pdf. Accessed 23 Jan 2016

  • Bowman SR, Potts C, Manning CD (2015) Recursive neural networks can learn logical semantics. In: Proceedings of the 3rd workshop on continuous vector space models and their compositionality (CVSC), Beijing, China, 26–31 July 2015, pp 12–21. doi:10.18653/v1/w15-4002

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–3537. http://arxiv.org/pdf/1103.0398.pdf

  • Dahl DA (2013) Natural language processing: past, present and future. In: Neustein A, Markowitz JA (eds), Mobile speech and advanced natural language solutions. Springer Science+Business Media, New York, pp 49–73. doi:10.1007/978-1-4614-6018-3_4

    Google Scholar 

  • Dahl DA, Bates M, Brown M, Fisher W, Hunicke-Smith K, Pallett D, Pao C, Rudnicky A, Shriberg E (1994) Expanding the scope of the ATIS talk: the ATIS-3 corpus. In: Proceedings of the workshop on human language technology (HLT’94), Association for computational linguistics, Stroudsburg, pp 43–48. doi:10.3115/1075812.1075823

  • Dinarelli M (2010) Spoken language understanding: from spoken utterances to semantic structures. Dissertation, University of Trento, 2010. http://eprints-phd.biblio.unitn.it/280/

  • Dinarelli M, Quarteroni S, Tonelli S, Moschitti A, Riccardi G (2009) Annotating spoken dialogs: from speech segments to dialog acts and frame semantics. In: Proceedings of SRSL 2009, the 2nd workshop on semantic representation of spoken language, Association for computational linguistics, Athens, Greece, March, pp 34–41. doi:10.3115/1626296.1626301

  • Gildea D, Jurafsky D (2002) Automatic labeling of semantic roles. Comp Linguist 28(3):245–288. doi:10.1162/089120102760275983

    Article  Google Scholar 

  • Godfrey JJ, Holliman EC, McDaniel J (1992) Switchboard: telephone speech corpus for research and development. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP-92), vol 1. San Francisco, 23–26 March, pp 517–520. doi:10.1109/icassp.1992.225858

  • Hahn S, Dinarelli M, Raymond C, Lefevre F, Lehnen P. De Mori R, Moschitti A, Ney H, Riccardi G (2011) Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans Speech Audio Proc 19(6):1569–1583. doi:10.1109/tasl.2010.2093520

    Google Scholar 

  • Hakkani-Tür D, Béchet F, Riccardi G, Tur G (2006) Beyond ASDR 1-best: using word confusion networks in spoken language understanding. Comp Speech Lang 20(4):495–514. doi:10.1016/j.csl.2005.07.005

    Article  Google Scholar 

  • He Y, Young S (2006) Spoken language understanding using the hidden vector state model. Speech Commun 48(3–4):262–275. doi:10.1016/j.specom.2005.06.002

    Article  Google Scholar 

  • Henderson J, Jurčíček F (2012) Data-driven methods for spoken language understanding. In: Lemon O, Pietquin O (eds) Data-driven methods for adaptive spoken dialogue systems: computational learning for conversational interfaces. Springer, New York, pp 19–38. doi:10.1007/978-1-4614-4803-7_3

    Google Scholar 

  • Huang X, Acero A, Hon H-W (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Hunt A, McGlashan S (2004) Speech recognition grammar specification version 1.0. http://www.w3.org/TR/speech-grammar/. Accessed 21 Jan 2016

  • Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Keizer S, op den Akker R, Nijholt A (2002) Dialogue act recognition with Bayesian networks for Dutch dialogues. In: Proceedings of the 3rd SIGdial workshop on discourse and dialogue, Philadelphia, PA, pp 88–94. doi: 10.3115/1118121.1118134

  • Klüwer T, Uszkoreit H, Xu F (2010) Using syntactic and semantic based relations for dialog act recognition. In: Proceedings of the 23rd international conference on computational linguistics (COLING’10), Association for computational linguistics, Stroudsburg, pp 570–578. http://www.aclweb.org/anthology/C10-2065.pdf. Accessed 21 Jan 2016

  • Kübler S, McDonald R, Nivre J (2009) Dependency parsing. Synthesis lectures on human language technologies. Morgan and Claypool Publishers, San Rafael. doi:10.2200/S00169ED1V01Y200901HLT002

    Google Scholar 

  • Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Socher R (2015) Ask me anything: dynamic memory networks for natural language processing. arXiv: http://arxiv.org/abs/1506.07285. Accessed 21 Jan 2016

  • Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML’01), Williamstown, MA, USA, 28 June–1 July 2001, pp 282–289. http://dl.acm.org/citation.cfm?id=655813

  • Lefèvre F (2006) A DBN-based multi-level stochastic spoken language understanding system. In IEEE spoken language technology workshop, Palm Beach, Aruba, 10–13 Dec 2006, pp 82–85. doi:10.1109/slt.2006.326822

  • Lefèvre F (2007) Dynamic bayesian networks and discriminative classifiers for multistage semantic interpretation. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP’07), vol 4. Honolulu, HI, USA, 15–20 Apr 2007, pp 13–16. doi:10.1109/ICASSP.2007.367151

  • Lemon O, Pietquin O (eds) (2012) Data-driven methods for adaptive spoken dialogue systems: computational learning for conversational interfaces. Springer, New York. doi:10.1007/978-1-4614-4803-7

    Google Scholar 

  • Macherey K, Bender O, Ney H (2009) Applications of statistical machine translation approaches to spoken language understanding. IEEE Trans Speech Audio Proc 17(4):803–818. doi:10.1109/tasl.2009.2014262

    Google Scholar 

  • Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2009) Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP’09), Taipei, Taiwan, 19–24 Apr 2009, pp 4749–4752. doi:10.1109/icassp.2009.4960692

  • Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  • Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D, Zweig G (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Speech Audio Proc 23(3):530–539. doi:10.1109/taslp.2014.2383614

    Google Scholar 

  • Mikolov T, Chen K, Corrado GS, Dean J (2013a) Efficient representation of word representations in vector space. In: Proceedings of the international workshop on learning representations (ICLR) 2013, Scottsdale, AZ, USA, 2–4 May 2013. http://arxiv.org/pdf/1301.3781.pdf. Accessed 21 Jan 2016

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Proceedings of the twenty-seventh conference on neural information processing systems 26 (NIPS 2013), Lake Tahoe, 5–10 Dec 2013. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. Accessed 21 Jan 2016

  • Nagata M, Morimoto T (1994) First steps toward statistical modeling of dialogue to predict the speech act type of the next utterance. Speech Commun 15:193–203. doi:10.1016/0167-6393(94)90071-x

    Article  Google Scholar 

  • Raymond C, Riccardi G (2007) Generative and discriminative algorithms for spoken language understanding. In: Proceedings of the 8th annual conference of the international speech communication association (Interspeech 2007), Antwerp, Belgium, 27–31 Aug, pp 1605–1608. http://www.isca-speech.org/archive/archive_papers/interspeech_2007/i07_1605.pdf. Accessed 21 Jan 2016

  • Reese RM (2015) Natural language processing with Java. Packt Publishing Ltd., Birmingham

    Google Scholar 

  • Shriberg E, Bates R, Stolcke A, Taylor P, Jurafsky D, Ries K, Coccaro N, Martin R, Meteer M, Ess-Dykema CV (1998) Can prosody aid the automatic classification of dialog acts in conversational speech? Lang Speech 41(3–4):439–487. http://www.ncbi.nlm.nih.gov/pubmed/10746366. Accessed 21 Jan 2016

    Google Scholar 

  • Socher R, Bauer J, Manning CD, Ng AY (2013a) Parsing with compositional vector grammars. In: Proceedings of the 51st meeting of the association for computational linguistics (ACL) 2013, Sofia, Bulgaria, 4–9 Aug. http://www.aclweb.org/anthology/P/P13/P13-1045.pdf. Accessed 21 Jan 2016

  • Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013b) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP 2013), Seattle, Washington, USA, 18–21 Oct 2013, pp 1631–1642. http://www.aclweb.org/anthology/D/D13/D13-1170.pdf. Accessed 21 Jan 2016

  • Stolcke A, Ries K, Coccaro N, Shriberg E, Bates R, Jurafsky D, Taylor P, Martin R, Meteer M, Van Dykema C (2000) Dialogue act modelling for automatic tagging and recognition of conversational speech. Comp Linguist 26(3):339–371. doi:10.1162/089120100561737

    Article  Google Scholar 

  • Suzanne J, Klein A, Maier E, Maleck I, Mast M, Quantz J (1995) Dialogue acts in Verbmobil. Report 65, University of Hamburg, DFKI GmbH, University of Erlangen, TU Berlin

    Google Scholar 

  • Taylor A, Marcus M, Santorini B (2003) The penn treebank: an overview. In: Abeillé A (ed) Treebanks: building and using parsed corpora. Kluwer Academic Publishers, Dordrecht, pp 5–22. doi:10.1007/978-94-010-0201-1_1

    Google Scholar 

  • Tur G, de Mori R (eds) (2011) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester. doi:10.1002/9781119992691

    Google Scholar 

  • Tur G, Deng L (2011) Intent determination and spoken utterance classification. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester, pp 93–118. doi:10.1002/9781119992691.ch4

    Google Scholar 

  • Tur G, Hakkani-Tür D (2011) Human/human conversation understanding. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester, pp 225–255. doi:10.1002/9781119992691.ch9

    Google Scholar 

  • Van Tichelen L, Burke D (2007) Semantic interpretation for speech recognition (SISR) version 1.0. http://www.w3.org/TR/semantic-interpretation/. Accessed 21 Jan 2016

  • Vapnik VN (1998) Statistical learning theory. Wiley, Chichester

    MATH  Google Scholar 

  • Walker MA, Rudnicky A, Prasad R, Aberdeen J, Bratt EO, Garofolo J, Hastie H, Le A, Pellom B, Potamianos A, Passonneau R, Roukos S, Sanders G, Seneff S, Stallard D (2002) DARPA communicator: cross-system results for the 2001 evaluation. In: Proceedings of the 7th international conference on spoken language processing (ICSLP2002), vol 1. Denver, Colorado, pp 273–276. http://www.isca-speech.org/archive/archive_papers/icslp_2002/i02_0269.pdf. Accessed 21 Jan 2016

  • Wang YY, Deng L, Acero A (2011) Semantic frame-based spoken language understanding. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Chichester, pp 41–91. doi:10.1002/9781119992691.ch3

    Google Scholar 

  • Ward W (1991) Understanding spontaneous speech: the Phoenix system. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP-91), Toronto, Canada, 14–17 Apr, pp 365–367. doi:10.1109/icassp.1991.150352

  • Webb N, Hepple M, Wilks Y (2005) Dialogue act classification using intra-utterance features. In: Proceedings of the AAAI workshop on spoken language understanding, Pittsburgh, PA, pp 451–458. http://staffwww.dcs.shef.ac.uk/people/Y.Wilks/papers/AAAI05_A.pdf. Accessed 21 Jan 2016

  • Williams JD (2012) A belief tracking challenge task for spoken dialog systems. In: NAACL-HLT Workshop on future directions and needs in the spoken dialog community: tools and data. NAACL 2012, Montreal, 7 June, 2012, 23–24. http://www.aclweb.org/anthology/W12-1812. Accessed 23 Jan 2016

  • Williams JD, Kamal E, Ashour M, Amr H, Miller J, Zweig G (2015a) Fast and easy language understanding for dialog systems with Microsoft Language Understanding Intelligent Service (LUIS). In: Proceedings of the SIGDIAL 2015 conference, Prague, Czech Republic, 2–4 Sept 2015, pp 159–161. doi:10.18653/v1/w15-4622

  • Williams JD, Niraula NB, Dasigi P, Lakshmiratan A, Suarez CGJ, Reddy M, Zweig G (2015b) Rapidly scaling dialog systems with interactive learning. In: Lee GG, Kim HK, Jeong M, Kim J-H (eds) Natural language dialog systems and intelligent assistants. Springer, New York, pp 1–12. doi:10.1007/978-3-319-19291-8_1

    Google Scholar 

  • Wu W-L, Lu R-Z, Duan J-Y, Liu H, Gao F, Chen Y-Q (2010) Spoken language understanding using weakly supervised learning. Comp Speech Lang 24(2):358–382. doi:10.1016/j.csl.2009.05.002

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael McTear .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

McTear, M., Callejas, Z., Griol, D. (2016). Spoken Language Understanding. In: The Conversational Interface. Springer, Cham. https://doi.org/10.1007/978-3-319-32967-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32967-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32965-9

  • Online ISBN: 978-3-319-32967-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics