Skip to main content
Log in

A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases

  • Research Article
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Encapsulators are linguistic units which establish coherent referential connections to the preceding discourse in a text. In this paper, we address the challenge of automatically analysing the pronominal encapsulator ello in Spanish text. Our method identifies, for each occurrence, the antecedent of the pronoun (including its grammatical type), the connective phrase which combines with the pronoun to express a discourse relation linking the antecedent text segment to the following text segment, and the type of semantic relation expressed by the complex discourse marker formed by the connective phrase and pronoun. We describe our annotation of a corpus to inform the development of our method and to finetune an automatic analyser based on bidirectional encoder representation transformers. On testing our method, we find that it performs with greater accuracy than three baselines (0.76 for the resolution task), and sets a promising benchmark for the automatic annotation of occurrences of the pronoun ello, their antecedents, and the semantic relations between the two text segments linked by the connective in combination with the pronoun.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. The complex discourse marker is not explicitly annotated. Only the component pronouns and connective phrases are annotated.

  2. https://spacy.io/. Last accessed 4th July 2019.

  3. In this paper, we consider gerund phrases to be noun phrases due to their distributional similarity to the latter.

  4. Available at https://github.com/google-research/bert. Last accessed 3rd July 2019.

  5. Available at https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip. Last accessed 26th May 2021. Further details on the derivation of BERT’s multilingual models are presented at https://github.com/google-research/bert/blob/master/multilingual.md. Last accessed 26th May 2021.

  6. Associating each occurrence of ello with a context of 512 neighbouring tokens.

  7. Tagging each sequence of 512 tokens independently of other sequences in the text.

  8. In the literature, this additional layer is usually described as being situated “on top of” the BERT layer.

  9. System to Automatically Classify and Resolve Ello.

  10. We used the implementation made in the scikit-learn machine learning library for Python to compute \(\kappa \) scores.

  11. According to the scale proposed by Viera and Garrett Viera and Garrett (2005).

  12. By contrast, token T2 in column Pred. class label (Method 3) is not of this type because it is two tokens away from the true start of the antecedent.

  13. Available from http://cs.famaf.unc.edu.ar/~ccardellino/SBWCE/SBW-vectors-300-min5.bin.gz. Last accessed 22nd August 2019. These word embeddings were derived from the Spanish Billion Word corpus, available from http://crscardellino.github.io/SBWCE/. Last accessed 22nd August 2019.

  14. Adjusted from \(\alpha =0.05\) for comparisons between two systems.

References

  • Ariel, M. (1988). Referring and accessibility. Journal of Linguistics, 64, 65–87.

    Article  Google Scholar 

  • Ariel, M. (1991). The function of accessibility in a theory of grammar. Journal of Pragmatics, 16, 443–463.

    Article  Google Scholar 

  • Ariel, M. (1999). Cognitive universals and linguistic conventions. The case of resumptive pronouns. Studies in Language, 23, 217–269.

    Article  Google Scholar 

  • Baum, L. E., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 37(6), 1554–1563. https://doi.org/10.1214/aoms/1177699147

    Article  Google Scholar 

  • Bello, A. (1911). Gramática de la Lengua Castellana. Roger & Chernovitz Editores.

    Google Scholar 

  • Benveniste, E. (1980). Problemas de Lingüística general. Tono I. Siglo XXI Editores.

    Google Scholar 

  • Borreguero, M. (2006). Naturaleza y función de los encapsuladores en los textos informativamente densos (la noticia periodística). Cuadernos de Filología Italiana, 13, 73–95.

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Cornish, F. (1999). Anaphora, discourse, and understanding. Oxford University Press.

    Google Scholar 

  • Cornish, F. (2008). How indexicals function in texts: Discourse, text, and one neo-Gricean account of indexical reference. Journal of Pragmatics, 40(6), 997.

    Article  Google Scholar 

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. Retrieved from https://www.aclweb.org/anthology/N19-1423

  • Fernández, O. (1999). El pronombre personal. Formas y distribuciones. Pronombre átonos y tónicos. In I. Bosque & V. C. Demonte (Eds.), Gramática Descriptiva de la Lengua Española (pp. 1209–1273). Espasa Calpe.

    Google Scholar 

  • Figueras, C. (2002). La jerarquía de accesibilidad de las expresiones referenciales en español. Revista Española de Lingüística, 32, 53–96.

    Google Scholar 

  • Francis, N. (1986). Anaphoric nouns. University of Birmingham.

    Google Scholar 

  • Gómez-Rodríguez, C., & Vilares, D. (2018). Constituent parsing as sequence labeling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 1314–1324). Association for Computational Linguistics. Retrieved from http://aclweb.org/anthology/D18-1162

  • González-Ruiz, R. (2009). Algunas notas en torno a un mecanismo de cohesión textual: La anáfora conceptual. Nuevos Enfoques y PropuestasEstudios sobre el Texto (pp. 247–278). Peter Lang.

    Google Scholar 

  • Halliday, M., & Hasan, R. (1976). Cohesion in english. Longman.

    Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computing, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  • Honnibal, M., & Johnson, M. (2015). An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (pp. 1373–1378). Association for Computational Linguistics, Lisbon, Portugal. Retrieved from https://aclweb.org/anthology/D/D15/D15-1162

  • Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. https://doi.org/10.1103/PhysRev.106.620

    Article  Google Scholar 

  • Kennison, S. (2003). Comprehending the pronouns her, him, and his: Implications for theories of referential processing. Journal of Memory and Language, 49(3), 335–352.

    Article  Google Scholar 

  • Kennison, S., & Trofe, J. (2003). Comprehending pronouns: A role for word-specific gender stereotype information. Journal of Psycholinguistic Research, 32(3), 355–378.

    Article  Google Scholar 

  • Kintsch, W. (1998). Comprehension a paradigm for cognition. Academic Press.

    Google Scholar 

  • Kudo, T. (2005). CRF++: Yet another CRF toolkit. Revised from http://crfpp.sourceforge.net

  • Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, (pp. 282–289). Morgan Kaufmann.

  • Lakshmi, S., Ram, R. V. S., & Sobha, L. D. (2012). Clause Boundary Identification for Malayalam Using CRF. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), (pp. 83–92). Association for Computational Linguistics, Mumbai, India.

  • López Samaniego, A. (2011). La categorización de entidades del discurso en la escritura profesional. Phd thesis, Universitat de Barcelona, Barcelona, España.

  • McCallum, A., & Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, (pp. 188–191). Association for Computational Linguistics.

  • Mitkov, R., Evans, R., Orasan, C., Barbu, C., Jones, L., & Sotirova, V. (2000). Coreference and anaphora: Developing annotating tools annotated resources and annotation strategies. In: Proceedings of DAARC-2000, UK, (pp. 49–58).

  • Montolío, E. (2013). Construcciones conectivas que encapsulan. [A pesar de + SN] y la escritura experta. Cuadernos AISPI, 2, 115–132.

    Google Scholar 

  • Montolío, E. (2014). Mecanismos de cohesión (II). Los conectores. In: E.M. (Dir) (ed.) Manual de Escritura Académica y Profesional, (pp. 9–92). Ariel, Barcelona.

  • Orăsan, C. (2003). PALinkA: A highly customizable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialog, (pp. 39 – 43). Sapporo, Japan. Retrieved from http://clg.wlv.ac.uk/papers/palinka-final.pdf.

  • Parodi, G. (2014). Comprensión de Textos Escritos. Teoría de la Comunicabilidad. Eudeba.

    Google Scholar 

  • Parodi, G., & Burdiles, G. (2016). Encapsulación y tipos de coherencia referencial y relacional: el pronombre “ello” como mecanismo encapsulador en el discurso escrito de la economía. Onomázein, 33(1), 107–129.

  • Parodi, G., & Burdiles, G. (2019). Los pronombres neutros ‘esto’, ‘eso’ y ‘aquello’ como mecanismos encapsuladores: coherencia referencial y relacional. Spanish in Context, 16(1), 104–127.

  • Parodi, G., Julio, C., Nadal, L., Burdiles, G., & Cruz, A. (2018). Always look back: Eye movements as a reflection of anaphoric encapsulation in Spanish while reading the neuter pronoun ello. Journal of Pragmatics, 132, 47–58.

    Article  Google Scholar 

  • Parodi, G., Julio, C., Nadal, L., Cruz, A., & Burdiles, G. (2019). Stepping back to look ahead: Neuter encapsulation and referent extension in counter-argumentative and causal semantic relations in Spanish. Language Cognition, 11(3), 431–454.

    Article  Google Scholar 

  • Portolés, J. (2004). Pragmática para Hispanistas. Longman.

    Google Scholar 

  • Prandi, M. (2004). The building blocks of meaning. Benjamins.

    Book  Google Scholar 

  • RAE. (2005). Diccionario Panhispánico de Dudas. Santillana, Bogotá.

  • RAE & ASALE. (2010). Nueva Gramática de la Lengua Española. Manual [New grammar of Spanish language. Handbook]. Espasa, Buenos Aires.

  • Recasens, M., Hu, Z., & Rhinehart, O. (2016). Sense anaphoric pronouns: Am i one? In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), pp. 1–6. Association for Computational Linguistics, San Diego, California. Retrieved from https://doi.org/10.18653/v1/W16-0701. https://www.aclweb.org/anthology/W16-0701

  • Rohanian, O., Taslimipoor, S., Yaneva, V., & Ha, L. A. (2017). Using gaze data to predict multiword expressions. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, (pp. 601–609). INCOMA Ltd., Varna, Bulgaria. Retrieved from https://doi.org/10.26615/978-954-452-049-6_078.

  • Sanders, T., Spooren, W., & Noordman, L. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15, 1–35. https://doi.org/10.1080/01638539209544800

    Article  Google Scholar 

  • Sanders, T., Spooren, W., & Noordman, L. (1993). Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics (pp. 93–134). Retrieved from https://doi.org/10.1515/cogl.1993.4.2.93.

  • Schmid, H. (2000). English abstract nouns as conceptual shells: From corpus to cognition. de Gruyter.

    Book  Google Scholar 

  • Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, (pp. 134–141). Association for Computational Linguistics.

  • Shimbo, M., & Hara, K. (2007). A discriminative learning model for coordinate conjunctions. In: Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (pp. 610–619). Prague.

  • Sinclair, J. (1993). Written discourse structure. In J. Sinclair, M. Hoey, & G. Fox (Eds.), Techniques of description. Spoken and written discourse (pp. 6–31). Routledge.

    Google Scholar 

  • Sinclair, J. (1994). Trust the text. In M. Coulthard (Ed.), Advances in written text analysis (pp. 6–31). Routledge.

    Google Scholar 

  • Sutton, C., & McCallum, A. (2011). An introduction to conditional random fields. Foundations and Trends in Machine Learning, 4(4), 268–373.

    Google Scholar 

  • Tadros, A. (1994). Predictive categories in expository texts. In M. Coulthard (Ed.), Advances in written text analysis (pp. 69–82). Routledge.

    Google Scholar 

  • Taslimipoor, S., Desantis, A., Cherchi, M., Mitkov, R., & Monti, J. (2016). Language resources for italian: Towards the development of a corpus of annotated italian multiword expressions. In: P. Basile, A. Corazza, F. Cutugno, S. Montemagni, M. Nissim, V. Patti, G. Semeraro, R. Sprugnoli (eds.) Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), CEUR Workshop Proceedings, vol. 1749. Napoli, Italy.

  • Taslimipoor, S., & Mitkov, R. (2016). Computational phraseology light: Automatic translation of multiword expressions without translation resources. Yearbook of Phraseology, 7(1), 149–166.

    Article  Google Scholar 

  • van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Academic Press.

    Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ukasz Kaiser, L., Polosukhin, I. (2017). Attention is all you need. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, (pp. 5998–6008). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  • Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360–363.

    Google Scholar 

  • Wang, X., Bruno, J., Molloy, H., Evanini, K., & Zechner, K. (2017). Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), (pp. 263–268). Association for Computational Linguistics, Vancouver, Canada.

  • Zulaica, I., & Gutiérrez, J. (2009). Hacia una semántica computacional de las anáforas demostrativas. Linguamática, 1(2), 81–90.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Evans.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parodi, G., Evans, R., Ha, L.A. et al. A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases. Lang Resources & Evaluation 56, 139–164 (2022). https://doi.org/10.1007/s10579-021-09559-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-021-09559-z

Keywords

Navigation