Skip to main content

A Proposal for Processing and Fusioning Multiple Information Sources in Multimodal Dialog Systems

  • Conference paper
Highlights of Practical Applications of Heterogeneous Multi-Agent Systems. The PAAMS Collection (PAAMS 2014)

Abstract

Multimodal dialog systems can be defined as computer systems that process two or more user input modes and combine them with multimedia system output. This paper is focused on the multimodal input, providing a proposal to process and fusion the multiple input modalities in the dialog manager of the system, so that a single combined input is used to select the next system action. We describe an application of our technique to build multimodal systems that process user’s spoken utterances, tactile and keyboard inputs, and information related to the context of the interaction. This information is divided in our proposal into external and internal context, user’s internal, represented in our contribution by the detection of their intention during the dialog and their emotional state.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Filipe, P., Mamede, N.: Ambient Intelligence Interaction via Dialogue Systems, pp. 109–124. Intech (2010)

    Google Scholar 

  2. López-Cózar, R., Callejas, Z.: Multimodal Dialogue for Ambient Intelligence and Smart Environments, pp. 559–579. Springer (2010)

    Google Scholar 

  3. Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding 108, 116–134 (2007)

    Article  Google Scholar 

  4. Turk, M.: Multimodal interaction: A review. Pattern Recognition Letters 36, 189–195 (2014)

    Article  Google Scholar 

  5. López-Cózar, R., Araki, M.: Spoken, Multilingual and Multimodal Dialogue Systems. John Wiley & Sons Publishers (2005)

    Google Scholar 

  6. Pieraccini, R.: The Voice in the Machine: Building Computers that Understand Speech. The MIT Press (2012)

    Google Scholar 

  7. Wahlster, W.: SmartKom: Foundations of Multimodal Dialog Systems. Springer (2006)

    Google Scholar 

  8. Dumas, B.: Frameworks, description languages and fusion engines for multimodal interactive systems. Master’s thesis, University of Fribourg, Fribourg, Switzerland (2010)

    Google Scholar 

  9. Traum, D., Larsson, S.: The Information State Approach to Dialogue Management, pp. 325–353. Kluwer (2003)

    Google Scholar 

  10. Williams, J., Young, S.: Partially Observable Markov Decision Processes for Spoken Dialog Systems. Computer Speech and Language 21(2), 393–422 (2007)

    Article  Google Scholar 

  11. Griol, D., Hurtado, L., Segarra, E., Sanchis, E.: A Statistical Approach to Spoken Dialog Systems Design and Evaluation. Speech Communication 50(8-9), 666–682 (2008)

    Article  Google Scholar 

  12. Ruiz, N., Chen, F., Oviatt, S.: Multimodal input, pp. 211–277. Elsevier (2010)

    Google Scholar 

  13. Dai, X., Khorram, S.: Data fusion using artificial neural networks: a case study on multitemporal change analysis. Computers, Environment and Urban Systems 23(1), 19–31 (1999)

    Article  Google Scholar 

  14. Tsilfidis, A., Mporas, I., Mourjopoulos, J., Fakotakis, N.: Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing. Computer Speech & Language 27(1), 380–395 (2013)

    Article  Google Scholar 

  15. Wu, W.L., Lu, R.Z., Duan, J.Y., Liu, H., Gao, F., Chen, Y.Q.: Spoken language understanding using weakly supervised learning. Computer Speech & Language 24(2), 358–382 (2010)

    Article  Google Scholar 

  16. Minker, W.: Design considerations for knowledge source representations of a stochastically-based natural language understanding component. Speech Communication 28(2), 141–154 (1999)

    Article  Google Scholar 

  17. Traum, D., Larsson, S.: The Information State Approach to Dialogue Management. In: Current and New Directions in Discourse and Dialogue. Kluwer Academic Publishers (2003)

    Google Scholar 

  18. Möller, S., Englert, R., Engelbrecht, K., Hafner, V., Jameson, A., Oulasvirta, A., Raake, A., Reithinger, N.: MeMo: towards automatic usability evaluation of spoken dialogue services by user error simulations. In: Proc. Interspeech 2006, pp. 1786–1789 (2006)

    Google Scholar 

  19. Chung, G.: Developing a flexible spoken dialog system using simulation. In: Proc. ACL 2004, pp. 63–70 (2004)

    Google Scholar 

  20. Schatzmann, J., Weilhammer, K., Stuttle, M., Young, S.: A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies. Knowledge Engineering Review 21(2), 97–126 (2006)

    Article  Google Scholar 

  21. Griol, D., Carbó, J., Molina, J.: A statistical simulation technique to develop and evaluate conversational agents. AI Communication 26(4), 355–371 (2013)

    Google Scholar 

  22. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication 53(9-10), 1062–1087 (2011)

    Article  Google Scholar 

  23. Callejas, Z., López-Cózar, R.: Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Communication 50(5), 416–433 (2008)

    Article  Google Scholar 

  24. Griol, D., Carbó, J., Molina, J.: Bringing context-aware access to the web through spoken interaction. Applied Intelligence 38(4), 620–640 (2013)

    Article  Google Scholar 

  25. Vo, M., Wood, C.: Building an application framework for speech and pen input integration in multimodal learning interfaces. In: Proc. of ICASSP 1996, pp. 3545–3548 (1996)

    Google Scholar 

  26. Johnston, M.: Unification-based multimodal parsing. In: Proc. of ACL 1996, pp. 624–630 (1996)

    Google Scholar 

  27. Wu, L., Oviatt, S., Cohen, P.: From members to teams to committee - a robust approach to gestural and multimodal recognition. IEEE Transactions on Neural Networks 13(4), 972–982 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Griol, D., Molina, J.M., García-Herrero, J. (2014). A Proposal for Processing and Fusioning Multiple Information Sources in Multimodal Dialog Systems. In: Corchado, J.M., et al. Highlights of Practical Applications of Heterogeneous Multi-Agent Systems. The PAAMS Collection. PAAMS 2014. Communications in Computer and Information Science, vol 430. Springer, Cham. https://doi.org/10.1007/978-3-319-07767-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07767-3_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07766-6

  • Online ISBN: 978-3-319-07767-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics