A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases

Parodi, Giovanni; Evans, Richard; Ha, Le An; Mitkov, Ruslan; Julio Vergara, Cristóbal Jesus; Olivares-López, Raúl Ignacio

doi:10.1007/s10579-021-09559-z

A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases

Research Article
Published: 04 September 2021

Volume 56, pages 139–164, (2022)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Giovanni Parodi¹,
Richard Evans ORCID: orcid.org/0000-0002-1220-8605²,
Le An Ha²,
Ruslan Mitkov²,
Cristóbal Jesus Julio Vergara¹ &
…
Raúl Ignacio Olivares-López¹

226 Accesses
Explore all metrics

Abstract

Encapsulators are linguistic units which establish coherent referential connections to the preceding discourse in a text. In this paper, we address the challenge of automatically analysing the pronominal encapsulator ello in Spanish text. Our method identifies, for each occurrence, the antecedent of the pronoun (including its grammatical type), the connective phrase which combines with the pronoun to express a discourse relation linking the antecedent text segment to the following text segment, and the type of semantic relation expressed by the complex discourse marker formed by the connective phrase and pronoun. We describe our annotation of a corpus to inform the development of our method and to finetune an automatic analyser based on bidirectional encoder representation transformers. On testing our method, we find that it performs with greater accuracy than three baselines (0.76 for the resolution task), and sets a promising benchmark for the automatic annotation of occurrences of the pronoun ello, their antecedents, and the semantic relations between the two text segments linked by the connective in combination with the pronoun.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

Near-term advances in quantum natural language processing

Article 11 April 2024

GPT-3: Its Nature, Scope, Limits, and Consequences

Article Open access 01 November 2020

Notes

The complex discourse marker is not explicitly annotated. Only the component pronouns and connective phrases are annotated.
https://spacy.io/. Last accessed 4th July 2019.
In this paper, we consider gerund phrases to be noun phrases due to their distributional similarity to the latter.
Available at https://github.com/google-research/bert. Last accessed 3rd July 2019.
Available at https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip. Last accessed 26th May 2021. Further details on the derivation of BERT’s multilingual models are presented at https://github.com/google-research/bert/blob/master/multilingual.md. Last accessed 26th May 2021.
Associating each occurrence of ello with a context of 512 neighbouring tokens.
Tagging each sequence of 512 tokens independently of other sequences in the text.
In the literature, this additional layer is usually described as being situated “on top of” the BERT layer.
System to Automatically Classify and Resolve Ello.
We used the implementation made in the scikit-learn machine learning library for Python to compute \(\kappa \) scores.
According to the scale proposed by Viera and Garrett Viera and Garrett (2005).
By contrast, token T2 in column Pred. class label (Method 3) is not of this type because it is two tokens away from the true start of the antecedent.
Available from http://cs.famaf.unc.edu.ar/~ccardellino/SBWCE/SBW-vectors-300-min5.bin.gz. Last accessed 22nd August 2019. These word embeddings were derived from the Spanish Billion Word corpus, available from http://crscardellino.github.io/SBWCE/. Last accessed 22nd August 2019.
Adjusted from \(\alpha =0.05\) for comparisons between two systems.

References

Ariel, M. (1988). Referring and accessibility. Journal of Linguistics, 64, 65–87.
Article Google Scholar
Ariel, M. (1991). The function of accessibility in a theory of grammar. Journal of Pragmatics, 16, 443–463.
Article Google Scholar
Ariel, M. (1999). Cognitive universals and linguistic conventions. The case of resumptive pronouns. Studies in Language, 23, 217–269.
Article Google Scholar
Baum, L. E., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 37(6), 1554–1563. https://doi.org/10.1214/aoms/1177699147
Article Google Scholar
Bello, A. (1911). Gramática de la Lengua Castellana. Roger & Chernovitz Editores.
Google Scholar
Benveniste, E. (1980). Problemas de Lingüística general. Tono I. Siglo XXI Editores.
Google Scholar
Borreguero, M. (2006). Naturaleza y función de los encapsuladores en los textos informativamente densos (la noticia periodística). Cuadernos de Filología Italiana, 13, 73–95.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Article Google Scholar
Cornish, F. (1999). Anaphora, discourse, and understanding. Oxford University Press.
Google Scholar
Cornish, F. (2008). How indexicals function in texts: Discourse, text, and one neo-Gricean account of indexical reference. Journal of Pragmatics, 40(6), 997.
Article Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. Retrieved from https://www.aclweb.org/anthology/N19-1423
Fernández, O. (1999). El pronombre personal. Formas y distribuciones. Pronombre átonos y tónicos. In I. Bosque & V. C. Demonte (Eds.), Gramática Descriptiva de la Lengua Española (pp. 1209–1273). Espasa Calpe.
Google Scholar
Figueras, C. (2002). La jerarquía de accesibilidad de las expresiones referenciales en español. Revista Española de Lingüística, 32, 53–96.
Google Scholar
Francis, N. (1986). Anaphoric nouns. University of Birmingham.
Google Scholar
Gómez-Rodríguez, C., & Vilares, D. (2018). Constituent parsing as sequence labeling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 1314–1324). Association for Computational Linguistics. Retrieved from http://aclweb.org/anthology/D18-1162
González-Ruiz, R. (2009). Algunas notas en torno a un mecanismo de cohesión textual: La anáfora conceptual. Nuevos Enfoques y PropuestasEstudios sobre el Texto (pp. 247–278). Peter Lang.
Google Scholar
Halliday, M., & Hasan, R. (1976). Cohesion in english. Longman.
Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computing, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Honnibal, M., & Johnson, M. (2015). An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (pp. 1373–1378). Association for Computational Linguistics, Lisbon, Portugal. Retrieved from https://aclweb.org/anthology/D/D15/D15-1162
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. https://doi.org/10.1103/PhysRev.106.620
Article Google Scholar
Kennison, S. (2003). Comprehending the pronouns her, him, and his: Implications for theories of referential processing. Journal of Memory and Language, 49(3), 335–352.
Article Google Scholar
Kennison, S., & Trofe, J. (2003). Comprehending pronouns: A role for word-specific gender stereotype information. Journal of Psycholinguistic Research, 32(3), 355–378.
Article Google Scholar
Kintsch, W. (1998). Comprehension a paradigm for cognition. Academic Press.
Google Scholar
Kudo, T. (2005). CRF++: Yet another CRF toolkit. Revised from http://crfpp.sourceforge.net
Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, (pp. 282–289). Morgan Kaufmann.
Lakshmi, S., Ram, R. V. S., & Sobha, L. D. (2012). Clause Boundary Identification for Malayalam Using CRF. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), (pp. 83–92). Association for Computational Linguistics, Mumbai, India.
López Samaniego, A. (2011). La categorización de entidades del discurso en la escritura profesional. Phd thesis, Universitat de Barcelona, Barcelona, España.
McCallum, A., & Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, (pp. 188–191). Association for Computational Linguistics.
Mitkov, R., Evans, R., Orasan, C., Barbu, C., Jones, L., & Sotirova, V. (2000). Coreference and anaphora: Developing annotating tools annotated resources and annotation strategies. In: Proceedings of DAARC-2000, UK, (pp. 49–58).
Montolío, E. (2013). Construcciones conectivas que encapsulan. [A pesar de + SN] y la escritura experta. Cuadernos AISPI, 2, 115–132.
Google Scholar
Montolío, E. (2014). Mecanismos de cohesión (II). Los conectores. In: E.M. (Dir) (ed.) Manual de Escritura Académica y Profesional, (pp. 9–92). Ariel, Barcelona.
Orăsan, C. (2003). PALinkA: A highly customizable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialog, (pp. 39 – 43). Sapporo, Japan. Retrieved from http://clg.wlv.ac.uk/papers/palinka-final.pdf.
Parodi, G. (2014). Comprensión de Textos Escritos. Teoría de la Comunicabilidad. Eudeba.
Google Scholar
Parodi, G., & Burdiles, G. (2016). Encapsulación y tipos de coherencia referencial y relacional: el pronombre “ello” como mecanismo encapsulador en el discurso escrito de la economía. Onomázein, 33(1), 107–129.
Parodi, G., & Burdiles, G. (2019). Los pronombres neutros ‘esto’, ‘eso’ y ‘aquello’ como mecanismos encapsuladores: coherencia referencial y relacional. Spanish in Context, 16(1), 104–127.
Parodi, G., Julio, C., Nadal, L., Burdiles, G., & Cruz, A. (2018). Always look back: Eye movements as a reflection of anaphoric encapsulation in Spanish while reading the neuter pronoun ello. Journal of Pragmatics, 132, 47–58.
Article Google Scholar
Parodi, G., Julio, C., Nadal, L., Cruz, A., & Burdiles, G. (2019). Stepping back to look ahead: Neuter encapsulation and referent extension in counter-argumentative and causal semantic relations in Spanish. Language Cognition, 11(3), 431–454.
Article Google Scholar
Portolés, J. (2004). Pragmática para Hispanistas. Longman.
Google Scholar
Prandi, M. (2004). The building blocks of meaning. Benjamins.
Book Google Scholar
RAE. (2005). Diccionario Panhispánico de Dudas. Santillana, Bogotá.
RAE & ASALE. (2010). Nueva Gramática de la Lengua Española. Manual [New grammar of Spanish language. Handbook]. Espasa, Buenos Aires.
Recasens, M., Hu, Z., & Rhinehart, O. (2016). Sense anaphoric pronouns: Am i one? In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), pp. 1–6. Association for Computational Linguistics, San Diego, California. Retrieved from https://doi.org/10.18653/v1/W16-0701. https://www.aclweb.org/anthology/W16-0701
Rohanian, O., Taslimipoor, S., Yaneva, V., & Ha, L. A. (2017). Using gaze data to predict multiword expressions. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, (pp. 601–609). INCOMA Ltd., Varna, Bulgaria. Retrieved from https://doi.org/10.26615/978-954-452-049-6_078.
Sanders, T., Spooren, W., & Noordman, L. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15, 1–35. https://doi.org/10.1080/01638539209544800
Article Google Scholar
Sanders, T., Spooren, W., & Noordman, L. (1993). Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics (pp. 93–134). Retrieved from https://doi.org/10.1515/cogl.1993.4.2.93.
Schmid, H. (2000). English abstract nouns as conceptual shells: From corpus to cognition. de Gruyter.
Book Google Scholar
Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, (pp. 134–141). Association for Computational Linguistics.
Shimbo, M., & Hara, K. (2007). A discriminative learning model for coordinate conjunctions. In: Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (pp. 610–619). Prague.
Sinclair, J. (1993). Written discourse structure. In J. Sinclair, M. Hoey, & G. Fox (Eds.), Techniques of description. Spoken and written discourse (pp. 6–31). Routledge.
Google Scholar
Sinclair, J. (1994). Trust the text. In M. Coulthard (Ed.), Advances in written text analysis (pp. 6–31). Routledge.
Google Scholar
Sutton, C., & McCallum, A. (2011). An introduction to conditional random fields. Foundations and Trends in Machine Learning, 4(4), 268–373.
Google Scholar
Tadros, A. (1994). Predictive categories in expository texts. In M. Coulthard (Ed.), Advances in written text analysis (pp. 69–82). Routledge.
Google Scholar
Taslimipoor, S., Desantis, A., Cherchi, M., Mitkov, R., & Monti, J. (2016). Language resources for italian: Towards the development of a corpus of annotated italian multiword expressions. In: P. Basile, A. Corazza, F. Cutugno, S. Montemagni, M. Nissim, V. Patti, G. Semeraro, R. Sprugnoli (eds.) Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), CEUR Workshop Proceedings, vol. 1749. Napoli, Italy.
Taslimipoor, S., & Mitkov, R. (2016). Computational phraseology light: Automatic translation of multiword expressions without translation resources. Yearbook of Phraseology, 7(1), 149–166.
Article Google Scholar
van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Academic Press.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ukasz Kaiser, L., Polosukhin, I. (2017). Attention is all you need. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, (pp. 5998–6008). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360–363.
Google Scholar
Wang, X., Bruno, J., Molloy, H., Evanini, K., & Zechner, K. (2017). Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), (pp. 263–268). Association for Computational Linguistics, Vancouver, Canada.
Zulaica, I., & Gutiérrez, J. (2009). Hacia una semántica computacional de las anáforas demostrativas. Linguamática, 1(2), 81–90.
Google Scholar

Download references

Author information

Authors and Affiliations

Pontificia Universidad Catolica de Valparaiso, Valparaiso, Chile
Giovanni Parodi, Cristóbal Jesus Julio Vergara & Raúl Ignacio Olivares-López
Research Institute of Information and Language Processing, University of Wolverhampton, Wulfruna Street, Wolverhamton, West Midlands, WV1 1LY, UK
Richard Evans, Le An Ha & Ruslan Mitkov

Authors

Giovanni Parodi
View author publications
You can also search for this author in PubMed Google Scholar
Richard Evans
View author publications
You can also search for this author in PubMed Google Scholar
Le An Ha
View author publications
You can also search for this author in PubMed Google Scholar
Ruslan Mitkov
View author publications
You can also search for this author in PubMed Google Scholar
Cristóbal Jesus Julio Vergara
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Ignacio Olivares-López
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Evans.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Parodi, G., Evans, R., Ha, L.A. et al. A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases. Lang Resources & Evaluation 56, 139–164 (2022). https://doi.org/10.1007/s10579-021-09559-z

Download citation

Accepted: 06 August 2021
Published: 04 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10579-021-09559-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Near-term advances in quantum natural language processing

GPT-3: Its Nature, Scope, Limits, and Consequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Near-term advances in quantum natural language processing

GPT-3: Its Nature, Scope, Limits, and Consequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation