Skip to main content

Segmenting Conversations by Topic, Initiative, and Style

  • Conference paper
  • First Online:
Information Retrieval Techniques for Speech Applications (IRTSA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2273))

Included in the following conference series:

Abstract

Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that speaker initiative and style may be used as segmentation criteria as well. A probabilistic segmentation procedure is presented which allows the integration and modeling of these features in a clean framework with good results.

Keyword based segmentation methods degrade significantly on our meeting database when speech recognizer transcripts are used instead of manual transcripts. Speaker initiative is an interesting feature since it delivers good segmentations and should be easy to obtain from the audio. Speech style variation at the beginning, middle and end of topics may also be exploited for topical segmentation and would not require the detection of rare keywords.

I would like to thank my advisor Alex Waibel for supporting and encouraging this work and my collegues for various discoursive and practical contributions, especially Hua Yu and Klaus Zechner. The reviewers provided valuable comments for the final paper presentation. I would also like to thank our sponsors at DARPA. Any opinions, findings and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of DARPA, or any other party.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Allan, J. Carbonell, G. Doddington, J. P. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998.

    Google Scholar 

  2. D. Beeferman, A. Berger, and J. Lafferty. Statistical models for text segmentation. Machine Learning, 34:177–210, 1999. Special Issue on Natural Language Learning (C. Cardie and R. Mooney, eds).

    Google Scholar 

  3. J. Carletta, A. Isard, S. Isard, J. C. Kowtko, G. Doherty-Sneddon, and A. H. Anderson. The reliability of a dialogue structure coding scheme. Computational Linguistics, 23(1):13–31, March 1997.

    Google Scholar 

  4. F. Choi. Advances in domain independent linear text segmentation. In Proceedings of NAACL, Seattle, USA, 2000. Available with software at: http://www.cs.man.ac.uk/~choif/http://xxx.lanl.gov/abs/cs.CL/0003083.

  5. J. Garofolo, C. Auzanne, and E. Voorhees. The TREC spoken document retrieval track: A success story. In E. Voorhees, editor, Text Retrieval Conference (TREC) 8, Gaithersburg, Maryland, USA, 1999. November 16–19.

    Google Scholar 

  6. P. Geutner, M. Finke, and P. Scheytt. Adaptive vocabularies for transcribing multilingual broadcast news. In ICASSP, 1998.

    Google Scholar 

  7. B. Grosz and C. Sidner. Attention, intention and the structure of discourse. Computational Linguistics, 12(3):172–204, 1986.

    Google Scholar 

  8. M. Halliday and R. Hasan. Cohesion in English. Longman Group, 1976.

    Google Scholar 

  9. M. A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64, March 1997.

    Google Scholar 

  10. J. Hirschberg and C. Nakatani. Acoustic indicators of topic segmentation. In ICSLP, Sidney, Australia, 1998.

    Google Scholar 

  11. M.-Y. Kan, J. Klavans, and K. R. McKeown. Linear segmentation and segment signi ficance. In Proceedings of the 6th International Workshop on Very Large Corpora (WVLC-6), pages 197–205, Montreal, Canada, August 1998.

    Google Scholar 

  12. R. Kuhn and R. de Mori. A cache-base natural language model for speech recognition. IEEE Transactions on Pattern Analysis and machince Intelligence, 12(6):570–583, June 1990.

    Article  Google Scholar 

  13. P. Linell, L. Gustavsson, and P. Juvonen. Interactional dominance in dyadic communication: a presentation of initiative-response analysis. Linguistics, 26:415–442, 1988.

    Article  Google Scholar 

  14. W. C. Mann and S. Thomson. Rhetorical structure theory: Towards a functional theory of text organization. TEXT, 8:243–281, 1988.

    Google Scholar 

  15. D. Marcu. The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. PhD thesis, Department of Computer Science, University of Toronto, December 1997. Also published as Technical Report CSRG-371, Computer Systems Research Group, University of Toronto.

    Google Scholar 

  16. E. Mittendorf and P. Schäuble. Document and passage retrieval based on hidden markov models. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994.

    Google Scholar 

  17. T. P. Moran, L. Palen, S. Harrison, P. Chiu, D. Kimber, S. Minneman, W. van Melle, and P. Zellweger. “i’ll get that off the audio”: A case study of salvaging multimedia meeting records. In CHI 97, 1997.

    Google Scholar 

  18. H. Ney, U. Essen, and R. Kneser. On structuring probabilistic dependencies in stochastic language modeling. Computer Speech and Language, 8:1–35, 1994.

    Article  Google Scholar 

  19. Y. Pan and A. Waibel. The effects of room acoustics on MFCC speech parameters. In Proceedings of the ICSLP, Beijing, China, 2000.

    Google Scholar 

  20. R. J. Passonneau and D. J. Litman. Discourse segmentation by human and automated means. Computational Linguistics, 23(1):103, March 1997. 139.

    Google Scholar 

  21. J. M. Ponte and B. W. Croft. Text segmentation by topic. In Proceedings of the first European Conference on research and advanced technology for digital libraries, 1997. U.Mass. Computer Science Technical Report TR97-18.

    Google Scholar 

  22. M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July 1980.

    Google Scholar 

  23. F. Quek, D. McNeill, R. Bryll, C. Kirbas, H. Arslan, K. E. McCullough, and N. Furuyama. Gesture, speech, and gaze cues for discourse segmentation. In Proceedings of the Computer Vision and Pattern Recognition CVPR, 2000.

    Google Scholar 

  24. J. C. Reynar. Topic segmentation: Algorithms and applications. PhD thesis, Computer and Information Science, University of Pennsylvenia, 1998. Institute for Research in Cognitive Science (IRCS), University of Pennsylvenia, Technical report: IRCS-98-21.

    Google Scholar 

  25. K. Ries. HMM and neural network based speech act classification. In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 497–500, Phoenix, AZ, March 1999.

    Google Scholar 

  26. K. Ries, L. Levin, L. Valle, A. Lavie, and A. Waibel. Shallow discourse genre annotation in callhome spanish. In Proceecings of the International Conference on Language Ressources and Evaluation (LREC-2000), Athens, Greece, May 2000.

    Google Scholar 

  27. K. Ries and A. Waibel. Activity detection for information access to oral communication. In Human Language Technology Conference, Sand Diego, CA, USA, March 2001.

    Google Scholar 

  28. E. Shriberg, A. Stolcke, D. Hakkani-Tür, and G. Tür. Prosody modeling for automatic sentence and topic segmentation from speech. Speech Communication, 32(1–2):127–154, 2000. Special Issue on Accessing Information in Spoken Audio.

    Article  Google Scholar 

  29. A. Singhal and F. Pereira. Document expansion for speech retrieval. In In Proceedings of SIGIR, 1999.

    Google Scholar 

  30. A. Waibel, M. Bett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner. Advances in automatic meeting record creation and access. In ICASSP, Salt Lake City, Utah, USA, 2001.

    Google Scholar 

  31. M. A. Walker and S. Whittaker. Mixed initiative in dialogue: An investigation into discourse segmentation. In In Proc. 28th Annual Meeting of the ACL, 1990.

    Google Scholar 

  32. S. Whittaker, P. Hyland, and M. Wiley. Filochat: handwritten notes provide access to recorded conversations. In In Proceedings of CHI94 Conference on Computer Human Interaction, pages 271–277, 1994.

    Google Scholar 

  33. Yamron, I. Carp, L. Gillick, S. Lowe, and P. van Mulbregt. A hidden markov model approach to text segmentation and event tracking. In Proceedings of ICASSP, volume 1, pages 333–336, Seattle, WA, May 1998.

    Google Scholar 

  34. H. Yu, T. Tomokiyo, Z. Wang, and A. Waibel. New developments in automatic meeting transcription. In Proceedings of the ICSLP, Beijing, China, October 2000.

    Google Scholar 

  35. K. Zechner and A. Waibel. DIASUMM: Flexible summarization of spontaneous dialogues in unrestricted domains. In Proceedings of COLING, Saarbrücken, Germany, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ries, K. (2002). Segmenting Conversations by Topic, Initiative, and Style. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds) Information Retrieval Techniques for Speech Applications. IRTSA 2001. Lecture Notes in Computer Science, vol 2273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45637-6_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45637-6_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43156-5

  • Online ISBN: 978-3-540-45637-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics