Abstract
Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that speaker initiative and style may be used as segmentation criteria as well. A probabilistic segmentation procedure is presented which allows the integration and modeling of these features in a clean framework with good results.
Keyword based segmentation methods degrade significantly on our meeting database when speech recognizer transcripts are used instead of manual transcripts. Speaker initiative is an interesting feature since it delivers good segmentations and should be easy to obtain from the audio. Speech style variation at the beginning, middle and end of topics may also be exploited for topical segmentation and would not require the detection of rare keywords.
I would like to thank my advisor Alex Waibel for supporting and encouraging this work and my collegues for various discoursive and practical contributions, especially Hua Yu and Klaus Zechner. The reviewers provided valuable comments for the final paper presentation. I would also like to thank our sponsors at DARPA. Any opinions, findings and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of DARPA, or any other party.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Allan, J. Carbonell, G. Doddington, J. P. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998.
D. Beeferman, A. Berger, and J. Lafferty. Statistical models for text segmentation. Machine Learning, 34:177–210, 1999. Special Issue on Natural Language Learning (C. Cardie and R. Mooney, eds).
J. Carletta, A. Isard, S. Isard, J. C. Kowtko, G. Doherty-Sneddon, and A. H. Anderson. The reliability of a dialogue structure coding scheme. Computational Linguistics, 23(1):13–31, March 1997.
F. Choi. Advances in domain independent linear text segmentation. In Proceedings of NAACL, Seattle, USA, 2000. Available with software at: http://www.cs.man.ac.uk/~choif/http://xxx.lanl.gov/abs/cs.CL/0003083.
J. Garofolo, C. Auzanne, and E. Voorhees. The TREC spoken document retrieval track: A success story. In E. Voorhees, editor, Text Retrieval Conference (TREC) 8, Gaithersburg, Maryland, USA, 1999. November 16–19.
P. Geutner, M. Finke, and P. Scheytt. Adaptive vocabularies for transcribing multilingual broadcast news. In ICASSP, 1998.
B. Grosz and C. Sidner. Attention, intention and the structure of discourse. Computational Linguistics, 12(3):172–204, 1986.
M. Halliday and R. Hasan. Cohesion in English. Longman Group, 1976.
M. A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64, March 1997.
J. Hirschberg and C. Nakatani. Acoustic indicators of topic segmentation. In ICSLP, Sidney, Australia, 1998.
M.-Y. Kan, J. Klavans, and K. R. McKeown. Linear segmentation and segment signi ficance. In Proceedings of the 6th International Workshop on Very Large Corpora (WVLC-6), pages 197–205, Montreal, Canada, August 1998.
R. Kuhn and R. de Mori. A cache-base natural language model for speech recognition. IEEE Transactions on Pattern Analysis and machince Intelligence, 12(6):570–583, June 1990.
P. Linell, L. Gustavsson, and P. Juvonen. Interactional dominance in dyadic communication: a presentation of initiative-response analysis. Linguistics, 26:415–442, 1988.
W. C. Mann and S. Thomson. Rhetorical structure theory: Towards a functional theory of text organization. TEXT, 8:243–281, 1988.
D. Marcu. The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. PhD thesis, Department of Computer Science, University of Toronto, December 1997. Also published as Technical Report CSRG-371, Computer Systems Research Group, University of Toronto.
E. Mittendorf and P. Schäuble. Document and passage retrieval based on hidden markov models. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994.
T. P. Moran, L. Palen, S. Harrison, P. Chiu, D. Kimber, S. Minneman, W. van Melle, and P. Zellweger. “i’ll get that off the audio”: A case study of salvaging multimedia meeting records. In CHI 97, 1997.
H. Ney, U. Essen, and R. Kneser. On structuring probabilistic dependencies in stochastic language modeling. Computer Speech and Language, 8:1–35, 1994.
Y. Pan and A. Waibel. The effects of room acoustics on MFCC speech parameters. In Proceedings of the ICSLP, Beijing, China, 2000.
R. J. Passonneau and D. J. Litman. Discourse segmentation by human and automated means. Computational Linguistics, 23(1):103, March 1997. 139.
J. M. Ponte and B. W. Croft. Text segmentation by topic. In Proceedings of the first European Conference on research and advanced technology for digital libraries, 1997. U.Mass. Computer Science Technical Report TR97-18.
M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July 1980.
F. Quek, D. McNeill, R. Bryll, C. Kirbas, H. Arslan, K. E. McCullough, and N. Furuyama. Gesture, speech, and gaze cues for discourse segmentation. In Proceedings of the Computer Vision and Pattern Recognition CVPR, 2000.
J. C. Reynar. Topic segmentation: Algorithms and applications. PhD thesis, Computer and Information Science, University of Pennsylvenia, 1998. Institute for Research in Cognitive Science (IRCS), University of Pennsylvenia, Technical report: IRCS-98-21.
K. Ries. HMM and neural network based speech act classification. In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 497–500, Phoenix, AZ, March 1999.
K. Ries, L. Levin, L. Valle, A. Lavie, and A. Waibel. Shallow discourse genre annotation in callhome spanish. In Proceecings of the International Conference on Language Ressources and Evaluation (LREC-2000), Athens, Greece, May 2000.
K. Ries and A. Waibel. Activity detection for information access to oral communication. In Human Language Technology Conference, Sand Diego, CA, USA, March 2001.
E. Shriberg, A. Stolcke, D. Hakkani-Tür, and G. Tür. Prosody modeling for automatic sentence and topic segmentation from speech. Speech Communication, 32(1–2):127–154, 2000. Special Issue on Accessing Information in Spoken Audio.
A. Singhal and F. Pereira. Document expansion for speech retrieval. In In Proceedings of SIGIR, 1999.
A. Waibel, M. Bett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner. Advances in automatic meeting record creation and access. In ICASSP, Salt Lake City, Utah, USA, 2001.
M. A. Walker and S. Whittaker. Mixed initiative in dialogue: An investigation into discourse segmentation. In In Proc. 28th Annual Meeting of the ACL, 1990.
S. Whittaker, P. Hyland, and M. Wiley. Filochat: handwritten notes provide access to recorded conversations. In In Proceedings of CHI94 Conference on Computer Human Interaction, pages 271–277, 1994.
Yamron, I. Carp, L. Gillick, S. Lowe, and P. van Mulbregt. A hidden markov model approach to text segmentation and event tracking. In Proceedings of ICASSP, volume 1, pages 333–336, Seattle, WA, May 1998.
H. Yu, T. Tomokiyo, Z. Wang, and A. Waibel. New developments in automatic meeting transcription. In Proceedings of the ICSLP, Beijing, China, October 2000.
K. Zechner and A. Waibel. DIASUMM: Flexible summarization of spontaneous dialogues in unrestricted domains. In Proceedings of COLING, Saarbrücken, Germany, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ries, K. (2002). Segmenting Conversations by Topic, Initiative, and Style. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds) Information Retrieval Techniques for Speech Applications. IRTSA 2001. Lecture Notes in Computer Science, vol 2273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45637-6_5
Download citation
DOI: https://doi.org/10.1007/3-540-45637-6_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43156-5
Online ISBN: 978-3-540-45637-7
eBook Packages: Springer Book Archive