Skip to main content

Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory

  • Chapter
Current and New Directions in Discourse and Dialogue

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 22))

Abstract

We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse-specific applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Douglas Biber, Susan Conrad and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Bruce Britton and John Black. 1985. Understanding Expository Text. Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Jill Burstein, Daniel Marcu, Slava Andreyev, and Martin Chodorow. 2001. Towards automatic identification of discourse elements in essays. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France.

    Google Scholar 

  • Lynn Carlson and Daniel Marcu. 2001. Discourse Tagging Reference Manual. ISI Technical Report. ISI-TR-545. (http://www.isi.edu/~marcu/discourse/).

    Google Scholar 

  • Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwyneth Doherty-Sneddon, and Anne Anderson. 1997. The reliability of a dialogue structure coding scheme. Computational Linguistics 23(1): 13–32.

    Google Scholar 

  • Barbara Di Eugenio, Johanna Moore and Massimo Paolucci. 1997. Learning features that predict cue usage. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL 1997), pages 80–87, Madrid, Spain, July 7–12, 1997.

    Google Scholar 

  • Giacomo Ferrari. 1998. Preliminary steps toward the creation of a discourse and text resource. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC 1998), Granada, Spain, 999–1001.

    Google Scholar 

  • Giovanni Flammia and Victor Zue. 1995. Empirical evaluation of human performance and agreement in parsing discourse constituents in spoken dialogue. In Proceedings of the 4th European Conference on Speech Communication and Technology, Madrid, Spain, vol. 3, 1965–1968.

    Google Scholar 

  • Roger Garside, Steve Fligelstone and Simon Botley. 1997. Discourse Annotation: Anaphoric Relations in Corpora. In Corpus annotation: Linguistic information from computer text corpora, edited by R. Garside, G. Leech, and T. McEnery. London: Longman, 66–84.

    Google Scholar 

  • Roger Garside, Geoffrey Leech and Geoffrey Sampson, eds. 1987. The Computational Analysis of English: A Corpus-Based Approach. London: Longman.

    Google Scholar 

  • Talmy Givon. 1983. Topic continuity in discourse. In Topic Continuity in Discourse: a Quantitative Cross-Language Study. Amsterdam/Philadelphia: John Benjamins, 1–41.

    Google Scholar 

  • Joseph Evans Grimes. 1975. The Thread of Discourse. The Hague, Paris: Mouton.

    Google Scholar 

  • Barbara Grosz and Candice Sidner. 1986. Attentions, intentions, and the structure of discourse. Computational Linguistics, 12(3): 175–204.

    Google Scholar 

  • M. A. K. Halliday and Ruqaiya Hasan, 1976. Cohesion in English. London: Longman.

    Google Scholar 

  • Marti Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1): 33–64.

    Google Scholar 

  • Julia Hirschberg and Diane Litman. 1987. Now Let’s Talk About Now Identifying Cue Phrases Intonationally. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics (ACL-87), pages 163–171.

    Google Scholar 

  • Julia Hirschberg and Diane Litman. 1993. Empirical studies on the disambiguation of cue phrases. Computational Linguistics 19(3): 501–530.

    Google Scholar 

  • Eduard Hovy. 1993. Automated discourse generation using discourse structure relations. Artificial Intelligence 63(1–2): 341–386.

    Article  Google Scholar 

  • Alistair Knott. 1995. A Data-Driven Methodology for Motivating a Set of Coherence Relations. PhD Thesis, University of Edinburgh.

    Google Scholar 

  • Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology. Beverly Hills, CA: Sage Publications.

    Google Scholar 

  • Geoffrey Leech, Anthony McEnery, and Martin Wynne. 1997. Further levels of annotation. In Corpus Annotation: Linguistic Information from Computer Text Corpora, edited by R. Garside, G. Leech, and T. McEnery. London: Longman, 85–101.

    Google Scholar 

  • Lori Levin, Ann Thyme-Gobbel, Klaus Ries, Alon Lavie, and Monika Woszczyna. 1998. A discourse coding scheme for conversation Spanish. In Proceedings of the Fifth International Conference on Speech and Language Processing. Sydney, Australia.

    Google Scholar 

  • Diane Litman. 1996. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53–94.

    Google Scholar 

  • Robert Longacre. 1983. The Grammar of Discourse. New York: Plenum Press.

    Google Scholar 

  • William Mann and Sandra Thompson. 1988. Rhetorical structure theory. Toward a functional theory of text organization. Text, 8(3): 243–281.

    Google Scholar 

  • William Mann and Sandra Thompson, eds. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text. Amsterdam/Philadelphia: John Benjamins.

    Google Scholar 

  • Daniel Marcu. 2000. The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Daniel Marcu, Estibaliz Amorrortu, and Magdelena Romera. 1999. Experiments in constructing a corpus of discourse trees. In Proceedings of the ACL Workshop on Standards and Tools for Discourse Tagging, College Park, MD, 48–57.

    Google Scholar 

  • Daniel Marcu, Lynn Carlson, and Maki Watanabe. 2000. The automatic translation of discourse structures. Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, 9–17.

    Google Scholar 

  • Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank, Computational Linguistics 19(2), 313–330.

    Google Scholar 

  • James R. Martin. 1992. English Text. System and Structure. John Benjamin Publishing Company, Philadelphia/Amsterdam.

    Google Scholar 

  • Bonnie Meyer. 1985. Prose Analysis: Purposes, Procedures, and Problems. In Understanding Expository Text, edited by B. Britton and J. Black. Hillsdale, NJ: Lawrence Erlbaum Associates, 11–64.

    Google Scholar 

  • Johanna Moore. 1995. Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. Cambridge, MA: MIT Press.

    Google Scholar 

  • Johanna Moore and Cecile Paris. 1993. Planning text for advisory dialogues: capturing intentional and rhetorical information. Computational Linguistics 19(4): 651–694.

    Google Scholar 

  • Megan Moser and Johanna Moore. 1995. Investigating cue selection and placement in tutorial discourse. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, 130–135.

    Google Scholar 

  • Tadashi Nomoto and Yuji Matsumoto. 1999. Learning discourse relations with active data selection. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, 158–167.

    Google Scholar 

  • Rebecca Passonneau and Diane Litman. 1997. Discourse segmentation by human and automatic means. Computational Linguistics 23(1): 103–140.

    Google Scholar 

  • Marie-Paule Pery-Woodley and Josette Rebeyrolle. 1998. Domain and genre in sublanguage text: definitional microtexts in three corpora. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC-1998), Granada, Spain, 987–992.

    Google Scholar 

  • Livia Polanyi. 1988. A formal model of the structure of discourse. Journal of Pragmatics 12: 601–638.

    Article  Google Scholar 

  • Livia Polanyi. 1996. The linguistic structure of discourse. Center for the Study of Language and Information. CSLI-96-200.

    Google Scholar 

  • Josette Rebeyrolle. 2000. Utilisation de contextes défmitoires pour l’acquisition de connaissances à partir de textes. In Actes Journées Francophones d’Ingénierie de la Connaissance (IC’2000), Toulouse, IRIT, 105–114.

    Google Scholar 

  • Harvey Sacks, Emmanuel Schegloff, and Gail Jefferson. 1974. A simple systematics for the organization of turntaking in conversation. Language 50: 696–735.

    Article  Google Scholar 

  • Deborah Schiffrin. 1987. Discourse Markers. Cambridge, England: Cambridge University Press.

    Book  Google Scholar 

  • Sidney Siegal and N.J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.

    Google Scholar 

  • Beth Sundheim. 1995. Overview of results of the MUC-6 evaluation. In Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 13–31.

    Google Scholar 

  • Benjamin K. T’sou, Tom B.Y. Lai, Samuel W.K. Chan, Weijun Gao, and Xuegang Zhan. 2000. Enhancement of Chinese discourse marker tagger with C.4.5. In Proceedings of the Second Chinese Language Processing Workshop, Hong Kong, 38–45.

    Google Scholar 

  • Teun A. Van Dijk and Walter Kintsch. 1983. Strategies of Discourse Comprehension. New York: Academic Press.

    Google Scholar 

  • Ellen Voorhees and Donna Harman. 1999. The Eighth Text Retrieval Conference (TREC-8). NIST Special Publication 500–246.

    Google Scholar 

  • Charles Wayne. 2000. Multilingual topic detection and tracking: successful research enabled by corpora and evaluation. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece, 1487–1493.

    Google Scholar 

  • Janyce Wiebe, Rebecca Bruce, and Thomas O’Hara. 1999. Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MD, 246–253.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Carlson, L., Marcu, D., Okurowski, M.E. (2003). Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. In: van Kuppevelt, J., Smith, R.W. (eds) Current and New Directions in Discourse and Dialogue. Text, Speech and Language Technology, vol 22. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0019-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0019-2_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-1615-8

  • Online ISBN: 978-94-010-0019-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics