Skip to main content

Toward Adaptive Information Fusion in Multimodal Systems

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3869))

Included in the following conference series:

Abstract

In recent years, a new generation of multimodal systems has emerged as a major direction within the HCI community. Multimodal interfaces and architectures are time-critical and data-intensive to develop, which poses new research challenges. The goal of the present work is to model and adapt to users’ multimodal integration patterns, so that faster and more robust systems can be developed with on-line adaptation to individual’s multimodal temporal thresholds. In this paper, we summarize past user-modeling results on speech and pen multimodal integration patterns, which indicate that there are two dominant types of multimodal integration pattern among users that can be detected very early and remain highly consistent. The empirical results also indicate that, when interacting with a multimodal system, users intermix unimodal with multimodal commands. Based on these results, we present new machine-learning results comparing three models of on-line system adaptation to users’ integration patterns, which were based on Bayesian Belief Networks. This work utilized data from ten adults who provided approximately 1,000 commands while interacting with a map-based multimodal system. Initial experimental results with our learning models indicated that 85% of users’ natural mixed input could be correctly classified as either unimodal or multimodal, and 82% of users’ mulitmodal input could be correctly classified as either sequentially or simultaneously integrated. The long-term goal of this research is to develop new strategies for combining empirical user modeling with machine learning techniques to bootstrap accelerated, generalized, and improved reliability of information fusion in new types of multimodal system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: The Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, Boston, MA, pp. 102–203 (2000)

    Google Scholar 

  2. Oviatt, S.: Multimodal interfaces. In: The Handbook of Human-Computer Interaction, pp. 286–304. Law. Erlb. (2003)

    Google Scholar 

  3. Massaro, D., Stork, D.: Sensory ntegration and speech reading by humans and machines. American Sciences 86, 236–244 (1998)

    Google Scholar 

  4. Oviatt, S.: Integration and sychronization of input modes during multimodal human computer interaction. In: Proc. of CHI 1997, pp. 415–422 (1997)

    Google Scholar 

  5. Illina, I.: Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. In: Proc. of ICSLP 2002, pp. 1405–1408 (2002)

    Google Scholar 

  6. Xiao, B., Lunsford, R., Coulston, R., Wesson, M., Oviatt, S.L.: Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 265–272 (2003)

    Google Scholar 

  7. Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Advances in Neural Information Processing Systems, vol. 15, pp. 1213–1220 (2003)

    Google Scholar 

  8. Bengio, S.: Multimodal authentication using asynchronous hmms. In: AVBPA, pp. 770–777 (2003)

    Google Scholar 

  9. Howard, A., Jebara, T.: Dynamical systems trees. In: Uncertainty in Artificial Intelligence (2004)

    Google Scholar 

  10. Huang, X., Weng, J., Zhang, Z.: Office presence detection using multimodal context information. In: Proc. of ICASSP 2004, Montreal, Quebec, Canada, USA (2004)

    Google Scholar 

  11. Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Int. Journal on Computer Vision and Image Understanding 96(2), 227–248 (2004)

    Google Scholar 

  12. Oviatt, S.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74–81 (1999)

    Article  Google Scholar 

  13. Oviatt, S., Coulston, R., Lunsford, R.: When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proc. of ICMI 2004, Pennsylvania, USA, pp. 129–136. ACM Press, New York (2004)

    Google Scholar 

  14. Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multmodal integration patterns during human-computer interaction. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 44–51. ACM Press, New York (2003)

    Google Scholar 

  15. Oviatt, S., Lunsford, R., Coulston, R.: Individual differences in multimodal integration patterns: What are they and why do they exist? In: Prof. of CHI 2005, pp. 241–249. ACM Press, New York (2005)

    Google Scholar 

  16. Heckerman, D.: A tutorial on learning with Bayesian networks. Learning in Graphical Modals. MIT Press, Cambridge (1999)

    Google Scholar 

  17. Murphy, K.: The Bayes net toolbox for matlab. Computing Science and Statistics 33 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, X., Oviatt, S. (2006). Toward Adaptive Information Fusion in Multimodal Systems. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_2

Download citation

  • DOI: https://doi.org/10.1007/11677482_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32549-9

  • Online ISBN: 978-3-540-32550-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics