Abstract
We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse-specific applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Douglas Biber, Susan Conrad and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.
Bruce Britton and John Black. 1985. Understanding Expository Text. Hillsdale, NJ: Lawrence Erlbaum Associates.
Jill Burstein, Daniel Marcu, Slava Andreyev, and Martin Chodorow. 2001. Towards automatic identification of discourse elements in essays. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France.
Lynn Carlson and Daniel Marcu. 2001. Discourse Tagging Reference Manual. ISI Technical Report. ISI-TR-545. (http://www.isi.edu/~marcu/discourse/).
Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwyneth Doherty-Sneddon, and Anne Anderson. 1997. The reliability of a dialogue structure coding scheme. Computational Linguistics 23(1): 13–32.
Barbara Di Eugenio, Johanna Moore and Massimo Paolucci. 1997. Learning features that predict cue usage. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL 1997), pages 80–87, Madrid, Spain, July 7–12, 1997.
Giacomo Ferrari. 1998. Preliminary steps toward the creation of a discourse and text resource. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC 1998), Granada, Spain, 999–1001.
Giovanni Flammia and Victor Zue. 1995. Empirical evaluation of human performance and agreement in parsing discourse constituents in spoken dialogue. In Proceedings of the 4th European Conference on Speech Communication and Technology, Madrid, Spain, vol. 3, 1965–1968.
Roger Garside, Steve Fligelstone and Simon Botley. 1997. Discourse Annotation: Anaphoric Relations in Corpora. In Corpus annotation: Linguistic information from computer text corpora, edited by R. Garside, G. Leech, and T. McEnery. London: Longman, 66–84.
Roger Garside, Geoffrey Leech and Geoffrey Sampson, eds. 1987. The Computational Analysis of English: A Corpus-Based Approach. London: Longman.
Talmy Givon. 1983. Topic continuity in discourse. In Topic Continuity in Discourse: a Quantitative Cross-Language Study. Amsterdam/Philadelphia: John Benjamins, 1–41.
Joseph Evans Grimes. 1975. The Thread of Discourse. The Hague, Paris: Mouton.
Barbara Grosz and Candice Sidner. 1986. Attentions, intentions, and the structure of discourse. Computational Linguistics, 12(3): 175–204.
M. A. K. Halliday and Ruqaiya Hasan, 1976. Cohesion in English. London: Longman.
Marti Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1): 33–64.
Julia Hirschberg and Diane Litman. 1987. Now Let’s Talk About Now Identifying Cue Phrases Intonationally. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics (ACL-87), pages 163–171.
Julia Hirschberg and Diane Litman. 1993. Empirical studies on the disambiguation of cue phrases. Computational Linguistics 19(3): 501–530.
Eduard Hovy. 1993. Automated discourse generation using discourse structure relations. Artificial Intelligence 63(1–2): 341–386.
Alistair Knott. 1995. A Data-Driven Methodology for Motivating a Set of Coherence Relations. PhD Thesis, University of Edinburgh.
Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology. Beverly Hills, CA: Sage Publications.
Geoffrey Leech, Anthony McEnery, and Martin Wynne. 1997. Further levels of annotation. In Corpus Annotation: Linguistic Information from Computer Text Corpora, edited by R. Garside, G. Leech, and T. McEnery. London: Longman, 85–101.
Lori Levin, Ann Thyme-Gobbel, Klaus Ries, Alon Lavie, and Monika Woszczyna. 1998. A discourse coding scheme for conversation Spanish. In Proceedings of the Fifth International Conference on Speech and Language Processing. Sydney, Australia.
Diane Litman. 1996. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53–94.
Robert Longacre. 1983. The Grammar of Discourse. New York: Plenum Press.
William Mann and Sandra Thompson. 1988. Rhetorical structure theory. Toward a functional theory of text organization. Text, 8(3): 243–281.
William Mann and Sandra Thompson, eds. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text. Amsterdam/Philadelphia: John Benjamins.
Daniel Marcu. 2000. The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: The MIT Press.
Daniel Marcu, Estibaliz Amorrortu, and Magdelena Romera. 1999. Experiments in constructing a corpus of discourse trees. In Proceedings of the ACL Workshop on Standards and Tools for Discourse Tagging, College Park, MD, 48–57.
Daniel Marcu, Lynn Carlson, and Maki Watanabe. 2000. The automatic translation of discourse structures. Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, 9–17.
Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank, Computational Linguistics 19(2), 313–330.
James R. Martin. 1992. English Text. System and Structure. John Benjamin Publishing Company, Philadelphia/Amsterdam.
Bonnie Meyer. 1985. Prose Analysis: Purposes, Procedures, and Problems. In Understanding Expository Text, edited by B. Britton and J. Black. Hillsdale, NJ: Lawrence Erlbaum Associates, 11–64.
Johanna Moore. 1995. Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. Cambridge, MA: MIT Press.
Johanna Moore and Cecile Paris. 1993. Planning text for advisory dialogues: capturing intentional and rhetorical information. Computational Linguistics 19(4): 651–694.
Megan Moser and Johanna Moore. 1995. Investigating cue selection and placement in tutorial discourse. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, 130–135.
Tadashi Nomoto and Yuji Matsumoto. 1999. Learning discourse relations with active data selection. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, 158–167.
Rebecca Passonneau and Diane Litman. 1997. Discourse segmentation by human and automatic means. Computational Linguistics 23(1): 103–140.
Marie-Paule Pery-Woodley and Josette Rebeyrolle. 1998. Domain and genre in sublanguage text: definitional microtexts in three corpora. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC-1998), Granada, Spain, 987–992.
Livia Polanyi. 1988. A formal model of the structure of discourse. Journal of Pragmatics 12: 601–638.
Livia Polanyi. 1996. The linguistic structure of discourse. Center for the Study of Language and Information. CSLI-96-200.
Josette Rebeyrolle. 2000. Utilisation de contextes défmitoires pour l’acquisition de connaissances à partir de textes. In Actes Journées Francophones d’Ingénierie de la Connaissance (IC’2000), Toulouse, IRIT, 105–114.
Harvey Sacks, Emmanuel Schegloff, and Gail Jefferson. 1974. A simple systematics for the organization of turntaking in conversation. Language 50: 696–735.
Deborah Schiffrin. 1987. Discourse Markers. Cambridge, England: Cambridge University Press.
Sidney Siegal and N.J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.
Beth Sundheim. 1995. Overview of results of the MUC-6 evaluation. In Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 13–31.
Benjamin K. T’sou, Tom B.Y. Lai, Samuel W.K. Chan, Weijun Gao, and Xuegang Zhan. 2000. Enhancement of Chinese discourse marker tagger with C.4.5. In Proceedings of the Second Chinese Language Processing Workshop, Hong Kong, 38–45.
Teun A. Van Dijk and Walter Kintsch. 1983. Strategies of Discourse Comprehension. New York: Academic Press.
Ellen Voorhees and Donna Harman. 1999. The Eighth Text Retrieval Conference (TREC-8). NIST Special Publication 500–246.
Charles Wayne. 2000. Multilingual topic detection and tracking: successful research enabled by corpora and evaluation. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece, 1487–1493.
Janyce Wiebe, Rebecca Bruce, and Thomas O’Hara. 1999. Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MD, 246–253.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Carlson, L., Marcu, D., Okurowski, M.E. (2003). Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. In: van Kuppevelt, J., Smith, R.W. (eds) Current and New Directions in Discourse and Dialogue. Text, Speech and Language Technology, vol 22. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0019-2_5
Download citation
DOI: https://doi.org/10.1007/978-94-010-0019-2_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1615-8
Online ISBN: 978-94-010-0019-2
eBook Packages: Springer Book Archive