Abstract
Among the many possibilities of automation enabled by multi-sensor environments - several of which are discussed in this Handbook - one particularly relevant is the analysis of social interaction in the workplace, and more specifically, of conversational group interaction. Group conversations are ubiquitous, and represent a fundamental means through which ideas are discussed, progress is reported, and knowledge is created and disseminated.
Preview
Unable to display preview. Download preview PDF.
References
Argyle M, JGraham (1977) The central europe experiment - looking at persons and looking at things. Journal of Environmental Psychology and Nonverbal Behaviour 1:6–16
Ba S, Odobez JM (2008) Multi-person visual focus of attention from head pose and meeting contextual cues. Tech. Rep. 47, Idiap Research Institute
Ba S, Odobez JM (2008) Recognizing human visual focus of attention from head pose in meetings. IEEE Trans. on System, Man and Cybernetics: part B, Man, Vol. 39. No. 1. pp. 16-34, Feb 2009
Ba SO, Odobez JM (2005) A Rao-Blackwellized mixed state particle filter for head pose tracking. In: Proc. ACM-ICMI-MMMP, pp 9–16
Bachour K, Kaplan F, Dillenbourg P (Sept, 2008) An interactive table for regulating face-to-face collaborative learning. In: Proc. European Conf. on Technology-Enhanced Learning (ECTEL), Maastricht
Basu S, Choudhury T, Clarkson B, Pentland A (Dec. 2001) Towards measuring human interactions in conversational settings. In: Proc. IEEE CVPR Int. Workshop on Cues in Communication (CVPR-CUES), Kauai
Burgoon JK, Dunbar NE (2006) The Sage Handbook of Nonverbal Communication, Sage, chap Nonverbal expressions of dominance and power in human relationships
Cappella J (1985) Multichannel integrations of nonverbal behavior, Erlbaum, chap Controlling the floor in conversation
Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, T Hain JK, Karaiskos V, Kraaij W, Kronenthal M, Lathoud G, Lincoln M, A Lisowska IM, Post W, Reidsma D, Wellner P (2005) The AMI meeting corpus: A pre-announcement. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh
Chen L, Harper M, Franklin A, Rose T, Kimbara I (2005) A Multimodal Analysis of Floor Control in Meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)
Cook M, Smith JMC (1975) The role of gaze in impression formation. British Journal of Social and Clinical Psychology
DiMicco JM, Pandolfo A, Bender W (2004) Influencing group participation with a shared display. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), Chicago
Dines J, Vepa J, Hain T (2006) The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)
Dovidio JF, Ellyson SL (1982) Decoding visual dominance: atributions of power based on relative percentages of looking while speaking and looking while listening. Social Psychology Quarterly 45(2):106–113
Dunbar NE, Burgoon JK (2005) Perceptions of power and interactional dominance in interpersonal relationships. Journal of Social and Personal Relationships 22(2):207–233
Duncan Jr S (1972) Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2):283–292
Efran JS (1968) Looking for approval: effects of visual behavior of approbation from persons differing in importance. Journal of Personality and Social Psychology 10(1):21–25
Exline RV, Ellyson SL, Long B (1975) Advances in the study of communication and affect, Plenum Press, chap Visual behavior as an aspect of power role relationships
Fay N, Garod S, Carletta J (2000) Group discussion as interactive dialogue or serial monologue: the influence of group size. Psychological Science 11(6):487–492
Freedman EG, Sparks DL (1997) Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. Journal of Neurophysiology 77:2328–2348
Gatica-Perez D (2006) Analyzing human interaction in conversations: a review. In: Proc. IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI), Heidelberg
Gatica-Perez D (2009) Automatic Nonverbal Analysis of Social Interaction in Small Groups: a Review, Image and Vision Computing, Special Issue on Human Naturalistic Behavior
Gauvain J, Lee CH (1992) Bayesian learning for hidden Markov model with Gaussian mixture state observation densities. Speech Communication 11:205–213
Goodwin C, Heritage J (1990) Conversation analysis. Annual Review of Anthropology pp 981–987
Hall JA, Coats EJ, LeBeau LS (2005) Nonverbal behavior and the vertical dimension of social relations: A meta-analysis. Psychological Bulletin 131(6):898–924
Hayhoe M, Ballard D (2005) Eye movements in natural behavior. TRENDS in Cognitive Sciences 9(4):188–194
Hung H, Jayagopi D, Yeo C, Friedland G, Ba SO, Odobez JM, Ramchandran K, Mirghafori N, Gatica-Perez D (2007) Using audio and video features to classify the most dominant person in a group meeting. In: Proc. of ACM Multimedia
Hung H, Huang Y, Friedland G, Gatica-Perez D (2008) Estimating the dominant person in multi-party conversations using speaker diarization strategies. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas
Hung H, Jayagopi D, Ba S, Odobez JM, Gatica-Perez D (2008) Investigating automatic dominance estimation in groups from visual attention and speaking activity. in Proc. Int. Conf. on Multimodal Interfaces (ICMI), Chania, October.
Jayagopi D, Hung H, Yeo C, Gatica-Perez D (2009) Modeling dominance in group conversations using nonverbal activity cues. IEEE Trans. on Audio, Speech, and Language Processing, Special Issue on Multimodal Processing for Speech-based Interactions, Vol. 17, No. 3, pp. 501-513. March
Jovanovic N, Op den Akker H (2004) Towards automatic addressee identification in multi-party dialogues. In: 5th SIGdial Workshop on Discourse and Dialogue
Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychologica 26:22–63
Kim T, Chang A, Holland L, Pentland A (2008) Meeting mediator: Enhancing group collaboration with sociometric feedback. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), San Diego
Knapp ML, Hall JA (2005) Nonverbal Communication in Human Interaction. Wadsworth Publishing
Kouadio M, Pooch U (2002) Technology on social issues of videoconferencing on the internet: a survey. Journal of Network and Computer Applications 25:37–56
Kulyk O, Wang J, Terken J (2006) Real-time feedback on nonverbal behaviour to enhance social dynamics in small group meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)
Langton S, Watt R, Bruce V (2000) Do the eyes have it ? cues to the direction of social attention. Trends in Cognitive Sciences 4(2):50–58
Lathoud G (2006) Spatio-temporal analysis of spontaneous speech with microphone arrays. PhD thesis, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Lathoud G, McCowan I (2003) Location Based Speaker Segmentation. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03), Hong Kong
Matena L, Jaimes A, Popescu-Belis A (2008) Graphical representation of meetings on mobile devices. In: MobileHCI conference, Amsterdam, The Netherlands
Morimoto C, Mimica M (2005) Eye gaze tracking techniques for interactive applications. Computer Vision and Image Understanding 98:4–24
Novick D, Hansen B, Ward K (1996) Coordinating turn taking with gaze. In: International Conference on Spoken Language Processing
Odobez JM, Ba S (2007) A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of Focus of Attention from Head pose. In: Proc. of ICME
Ohno T (2005) Weak gaze awareness in video-mediated communication. In: Proceedings of Conference on Human Factors in Computing Systems, pp 1709–1712
Otsuka K, Takemae Y, Yamato J, Murase H (2005) A probabilistic inference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions, and utterances. In: Proc. of ICMI, pp 191–198
Otsuka K, Yamato J, Takemae Y, Murase H (2006) Conversation scene analysis with dynamic bayesian network based on visual head tracking. In: Proc. of ICME
Otsuka K, Yamato J, Takemae Y, Murase H (2006) Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns. In: Proc. ACM CHI Extended Abstract, Montreal
Ramírez J, Górriz J, Segura J (2007) Robust speech recognition and understanding, I-Tech, I-Tech Education and Publishing, Vienna, chap Voice activity detection: Fundamentals and speech recognition system robustness
Ranjan A, Birnholtz J, Balakrishnan R (2008) Improving meeting capture by applying television production principles with audio and motion detection. In: CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA, pp 227–236, DOI http://doi.acm.org/10.1145/1357054.1357095
Rhee HS, Pirkul H, Jacob V, Barhki R (1995) Effects of computer-mediated communication on group negotiation: Au empirical study. In: Proceedings of the 28th Annual Hawaii International Conference on System Sciences, pp 981–987
Rienks R, Heylen D (2005) Automatic dominance detection in meetings using easily detectable features. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh
Rienks R, Zhang D, Gatica-Perez D, Post W (2006) Detection and application of influence rankings in small-group meetings. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), Banff
Schmid Mast M (2002) Dominance as expressed and inferred through speaking time: A meta-analysis. Human Communication Research 28(3):420–450
Shriberg E, Stolcke A, Baron D (2001) Can prosody aid the automatic processing of multi-party meetings? evidence from predicting punctuation, disfluencies, and overlapping speech. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (Prosody 2001)
Stiefelhagen R (2002) Tracking and modeling focus of attention. PhD thesis, University of Karlsruhe
Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans on Neural Networks 13(4):928–938
Sturm J, Herwijnen OHV, Eyck A, Terken J (2007) Influencing social dynamics in meetings through a peripheral display. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), Nagoya
Takemae Y, Otsuka K, Yamato J (2005) Automatic video editing system using stereo-based head tracking for multiparty conversation. In: ACM Conference on Human Factors in Computing Systems, pp 1817–1820
Valente F (2006) Infinite models for speaker clustering. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)
Vijayasenan D, Valente F, Bourlard H (2008) Integration of tdoa features in information bottleneck framework for fast speaker diarization. In: Interspeech 2008
Wrigley SJ, Brown GJ, Wan V, Renals S (2005) Speech and crosstalk detection in multi-channel audio. IEEE Trans on Speech and Audio Processing 13:84–91
Yeo C, Ramchandran K (2008) Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Tech. Rep. UCB/EECS-2008-79, EECS Department, University of California, Berkeley
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Gatica-Perez, D., Odobez, JM. (2010). Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor Environments. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds) Handbook of Ambient Intelligence and Smart Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-93808-0_16
Download citation
DOI: https://doi.org/10.1007/978-0-387-93808-0_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-93807-3
Online ISBN: 978-0-387-93808-0
eBook Packages: Computer ScienceComputer Science (R0)