Toward Adaptive Information Fusion in Multimodal Systems

Huang, Xiao; Oviatt, Sharon

doi:10.1007/11677482_2

Xiao Huang¹⁸ &
Sharon Oviatt¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3869))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

2029 Accesses
6 Citations

Abstract

In recent years, a new generation of multimodal systems has emerged as a major direction within the HCI community. Multimodal interfaces and architectures are time-critical and data-intensive to develop, which poses new research challenges. The goal of the present work is to model and adapt to users’ multimodal integration patterns, so that faster and more robust systems can be developed with on-line adaptation to individual’s multimodal temporal thresholds. In this paper, we summarize past user-modeling results on speech and pen multimodal integration patterns, which indicate that there are two dominant types of multimodal integration pattern among users that can be detected very early and remain highly consistent. The empirical results also indicate that, when interacting with a multimodal system, users intermix unimodal with multimodal commands. Based on these results, we present new machine-learning results comparing three models of on-line system adaptation to users’ integration patterns, which were based on Bayesian Belief Networks. This work utilized data from ten adults who provided approximately 1,000 commands while interacting with a map-based multimodal system. Initial experimental results with our learning models indicated that 85% of users’ natural mixed input could be correctly classified as either unimodal or multimodal, and 82% of users’ mulitmodal input could be correctly classified as either sequentially or simultaneously integrated. The long-term goal of this research is to develop new strategies for combining empirical user modeling with machine learning techniques to bootstrap accelerated, generalized, and improved reliability of information fusion in new types of multimodal system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: The Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, Boston, MA, pp. 102–203 (2000)
Google Scholar
Oviatt, S.: Multimodal interfaces. In: The Handbook of Human-Computer Interaction, pp. 286–304. Law. Erlb. (2003)
Google Scholar
Massaro, D., Stork, D.: Sensory ntegration and speech reading by humans and machines. American Sciences 86, 236–244 (1998)
Google Scholar
Oviatt, S.: Integration and sychronization of input modes during multimodal human computer interaction. In: Proc. of CHI 1997, pp. 415–422 (1997)
Google Scholar
Illina, I.: Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. In: Proc. of ICSLP 2002, pp. 1405–1408 (2002)
Google Scholar
Xiao, B., Lunsford, R., Coulston, R., Wesson, M., Oviatt, S.L.: Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 265–272 (2003)
Google Scholar
Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Advances in Neural Information Processing Systems, vol. 15, pp. 1213–1220 (2003)
Google Scholar
Bengio, S.: Multimodal authentication using asynchronous hmms. In: AVBPA, pp. 770–777 (2003)
Google Scholar
Howard, A., Jebara, T.: Dynamical systems trees. In: Uncertainty in Artificial Intelligence (2004)
Google Scholar
Huang, X., Weng, J., Zhang, Z.: Office presence detection using multimodal context information. In: Proc. of ICASSP 2004, Montreal, Quebec, Canada, USA (2004)
Google Scholar
Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Int. Journal on Computer Vision and Image Understanding 96(2), 227–248 (2004)
Google Scholar
Oviatt, S.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74–81 (1999)
Article Google Scholar
Oviatt, S., Coulston, R., Lunsford, R.: When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proc. of ICMI 2004, Pennsylvania, USA, pp. 129–136. ACM Press, New York (2004)
Google Scholar
Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multmodal integration patterns during human-computer interaction. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 44–51. ACM Press, New York (2003)
Google Scholar
Oviatt, S., Lunsford, R., Coulston, R.: Individual differences in multimodal integration patterns: What are they and why do they exist? In: Prof. of CHI 2005, pp. 241–249. ACM Press, New York (2005)
Google Scholar
Heckerman, D.: A tutorial on learning with Bayesian networks. Learning in Graphical Modals. MIT Press, Cambridge (1999)
Google Scholar
Murphy, K.: The Bayes net toolbox for matlab. Computing Science and Statistics 33 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Human-Computer Communication, Computer Science Department, Oregon Health and Science University, Beaverton, OR, 97006, USA
Xiao Huang & Sharon Oviatt

Authors

Xiao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sharon Oviatt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, X., Oviatt, S. (2006). Toward Adaptive Information Fusion in Multimodal Systems. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_2

Download citation

DOI: https://doi.org/10.1007/11677482_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics