Skip to main content
Log in

Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The resource components include: (i) a lexicon-grammar based dictionary of 2100 predicate nouns co-occurring with the support verb ser de ‘be of’, such as in ser de uma ajuda inestimável ‘be of invaluable help’; (ii) a lexicon-grammar based dictionary of 6000 predicate nouns co-occurring with the support verb fazer ‘do’ or ‘make’, such as in fazer uma comparação ‘make a comparison’; and (iii) a lexicon-grammar based dictionary of about 5000 human intransitive adjectives co-occurring with the copula verbs ser and/or estar ‘be’, such as in ser simpático ‘be kind’ or estar entusiasmado ‘be enthusiastic’. A set of local grammars explore the properties described in linguistic resources, enabling a variety of text transformation tasks for paraphrasing applications. The paper highlights the different complementary and synergistic components and integration efforts, and presents some preliminary evaluation results on the inclusion of such resources in the eSPERTo paraphrase generation system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://logos-os.dfki.de/.

  2. http://www.hlt.inesc-id.pt/openlogos/demo.html.

  3. http://esperto.hlt.inesc-id.pt/.

  4. http://www.nooj-association.org/.

  5. http://esperto.hlt.inesc-id.pt/.

  6. https://string.hlt.inesc-id.pt/.

  7. This is one of the most frequent verbs in European Portuguese, both in written texts and in the spoken language. Sentences with support verb constructions are often more frequent than sentences with the equivalent verbal constructions. This is corroborated by Barreiro (2009), who showed that from a search on all sentences of the COMPARA parallel corpus (Frankenberg-Garcia & Santos, 2003; Santos & Inácio, 2006) where the infinitive form of fazer occurs with a noun or with a left modifier and a noun, 47% of the times the occurrence is a support verb construction.

  8. http://www.linguateca.pt/acesso/corpus.php?corpus=CETEMPUBLICO.

  9. The underscore indicates that two lexical units (preposition and definite article), normally contracted, were split here for clarity purposes.

  10. This is also a paraphrase of O Pedro fez uma festinha à Joana na cara ‘Pedro did a caress to Joana in the face’.

  11. Many support verb construction can undergo passive as well.

  12. The asterisk ‘*’ signals the sentence unacceptability, while the question mark indicates doubtful acceptability.

  13. Even though variants are also support verbs, they may feature syntactic properties of their own, so a detailed description is in order.

  14. To be precise, the ser de construction expresses not only a human quality, but it also characterizes the attitude or a gesture from the subject towards the human complement, e.g. A atitude/o gesto do Pedro foi de uma certa gentileza ‘Pedro’s attitude/gesture is of a certain kindness’. On the other hand, the sentence with fazer is not strictly semantically equivalent, as the paraphrase involves a regular meaning difference, where the expression of a human quality is, at least, not so obvious, and only the second interpretation of the ser de construction is kept. They can be treated as approximate paraphrases and the difference is systematic.

References

  • Artstein, R., & Poesio, M. (2008). Survey article: Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. https://doi.org/10.1162/coli.07-034-R2.

    Article  Google Scholar 

  • Baptista, J. (1997). Sermão, tareia e facada: uma classificação das expressões conversas dar-levar. Seminários de Linguística, 1, 5–37.

    Google Scholar 

  • Baptista, J. (2000). Sintaxe dos predicados nominais construídos com o verbo-suporte ser de. Tese de doutoramento, Universidade do Algarve.

  • Baptista, J. (2004). Instrument nouns and fusion. Predicative nouns designating violent actions, Linguisticae Investigationes Supplementa vol Lexique, Syntaxe et Lexique-Grammaire (Syntax, Lexis and Lexicon-Grammar). Hommage à Maurice Gross (pp. 31–40). John Benjamins Publishing Co.

  • Baptista, J. (2005a). Construções simétricas: argumentos e complementos. In O. Figueiredo, G. Rio-Torto, & F. Silva (Eds.), Estudos de homenagem a Mário Vilela (pp. 353–367). London: Faculdade de Letras da Universidade do Porto.

    Google Scholar 

  • Baptista, J. (2005b). Sintaxe dos predicados nominais com ‘ser de’. Lisbon: Fundação Calouste Gulbenkian, Fundação para a Ciência e a Tecnologia.

  • Baptista, J., Fernandes, G., Talhadas, R., Dias, F., & Mamede, N. (2015). Implementing European Portuguese Verbal Idioms in a Natural Language Processing System. In Proceedings of conference of the European Society of Phraseology (Europhras 2015), Málaga, Spain (pp. 102–115).

  • Baptista, J., Mamedem, N., & Markov, I. (2014). Integrating a Lexicon-Grammar of Verbal Idioms in a Portuguese NLP System, PARSEME General Meeting, Athens, 10–11 March 2014 (poster session).

  • Barreiro, A. (2009). Make it simple with paraphrases: Automated paraphrasing for authoring aids and machine translation. PhD thesis, Universidade do Porto.

  • Barreiro, A. (2011). Spider: A system for paraphrasing in document editing and revision—applicability in machine translation pre-editing. In A. Gelbukh (Ed.), Proceedings of 12th international conference on Computational Linguistics and Intelligent Text Processing (CICLing 2011), Tokyo, Japan, 20–26 February 2011 (pp. 365–376), Part II. Springer.

  • Barreiro, A., Batista, F., Ribeiro, R., Moniz, H., & Trancoso, I. (2014). OpenLogos semantico-syntactic knowledge-rich bilingual dictionaries. In NCC Chair, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the ninth international conference on language resources and evaluation (LREC’14). European Language Resources Association (ELRA).

  • Barreiro, A., & Mota, C. (2017). e-PACT: eSPERTo paraphrase aligned corpus of EN-EP/BP translations. Tradução em Revista, 1(22), 87–102.

    Google Scholar 

  • Barreiro, A., & Mota, C. (2018). Paraphrastic variance between European and Brazilian Portuguese. In M. Zampieri, P. Nakov, N. Ljubešić, J. Tiedemann, S. Malmasi, & A. Ali (Eds.), Proceedings of the fifth workshop on NLP for similar languages, varieties and dialects (VarDial) (COLING 2018). Association for Computational Linguistics.

  • Barreiro, A., Rebelo-Arnold, I., Mota, C., Garcez, I., & Baptista, J. (2018, forthcoming). Automatic paraphrasing and normalization of Portuguese informal into formal language. In A. Barreiro, J. Baptista, P. Quaresma & R. Vieira (Eds.), Proceedings of the first workshop on linguistic tools and resources for paraphrasing in Portuguese (POP@PROPOR 2018). Springer.

  • Carvalho, P. (2007). Análise e Representação de Construções Adjectivais para Processamento Automático de Texto. Adjectivos Intransitivos Humanos. PhD thesis, Universidade de Lisboa.

  • Casteleiro, J. M. (1981). Sintaxe transformacional do adjetivo. INIC.

  • Chacoto, L. (2005). O Verbo Fazer em Construções Nominais Predicativas. PhD thesis, Universidade do Algarve.

  • Cohn, T., Callison-Burch, C., & Lapata, M. (2008). Constructing corpora for the development and evaluation of paraphrase systems. Computational Linguistics, 34(4), 597–614. https://doi.org/10.1162/coli.08-003-R1-07-044.

  • D’Agostino, E., & Elia, A. (1998). Il significato delle frasi: un continuum dalle frasi semplici alle forme polirematiche. In AA VV, Ai limiti del linguaggio (pp. 287–310). Laterza.

  • Frankenberg-Garcia, A., & Santos, D. (2003). Introducing COMPARA: The Portuguese-English parallel corpus. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 71–87). St. Jerome.

    Google Scholar 

  • Gamallo, P., & Pereira-Fariña, M. (2019). Explorando métodos non-supervisados para calcular a similitude semántica textual. Linguamática, 10(2), 63–68. https://doi.org/10.21814/lm.10.2.275.

  • Gross, G. (1989). Les construction converses du français. Droz.

    Google Scholar 

  • Gross, M. (1975). Méthodes en syntaxe: régime des constructions complétives. Actualités scientifiques et industrielles. Hermann.

    Google Scholar 

  • Gross, M. (1981). Les bases empiriques de la notion de prédicat sémantique. Langages, 15(63), 7–52.

    Article  Google Scholar 

  • Gross, M. (1982). Une classification des phrases «figées» du français. Revue québécoise de linguistique, 11(2), 151–185.

  • Grycner, A., & Weikum, G. (2016). POLY: Mining relational paraphrases from multilingual sentences. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 2183-2192). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1236. https://www.aclweb.org/anthology/D16-1236.

  • Guillet, A., & Leclère, C. (1981). Restructuration du groupe nominal. Langages, 1(63), 99–125.

    Article  Google Scholar 

  • Harris, Z. S. (1952). Discourse analysis. Language, 1(28), 1–30.

    Article  Google Scholar 

  • Harris, Z. S. (1964). Papers on Syntax, D. Reidel Publishing Company, The elementary transformations, (pp. 211–235).

  • Harris, Z. S. (1968). Mathematical structures of language. Interscience tracts in pure and applied mathematics, Interscience Publishers.

  • Harris, Z. S. (1976). Notes du Cours de Syntaxe. Seuil.

    Google Scholar 

  • Harris, Z. S. (1981). The elementary transformations (pp. 211–235). Springer.

    Google Scholar 

  • Harris, Z. S. (1991). A theory of language and information: A mathematical approach. Clarendon Press.

    Google Scholar 

  • Harris, Z. Z. S. (1965). Transformational theory. Language, 41(3), 363–401.

    Article  Google Scholar 

  • Janssen, M., Kuhn, T. Z., Ferreira, J. P., & Correia, M. (2018). The CPLP corpus: A pluricentric corpus for the common portuguese spelling dictionary (VOC). In J. Čibej, V. Gorjanc, I. Kosem, & S. Krek (Eds.), Proceedings of the XVIII EURALEX international congress: Lexicography in global contexts (pp. 835–840). Ljubljana University Press, Faculty of Arts, Ljubljana, Slovenia.

  • Laporte, E., & Voyatzi, S. (2008). An electronic dictionary of French multiword adverbs. In Language resources and evaluation conference. Workshop towards a shared task for multiword expressions (pp. 31–34).

  • Leclère, C. (1995). Sur une restructuration dative. Language Research, 1(31), 179–198.

    Google Scholar 

  • Machonis, P. (2010). English phrasal verbs: from lexicon-grammar to natural language processing. Southern Journal of Linguistics, 34(1), 21–48.

    Google Scholar 

  • Mamede, N., Baptista, J., Diniz, C., & Cabarrão, V. (2012). STRING: A hybrid statistical and rule-based natural language processing chain for Portuguese. In International conference on computational processing of Portuguese (PROPOR 2012), Coimbra, Portugal, vol Demo Session

  • Mayhew, S., Bicknell, K., Brust, C., McDowell, B., Monroe, W., & Settles, B. (2020). Simultaneous translation and paraphrase for language education. In Proceedings of the fourth workshop on neural generation and translation (pp. 232-243). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.ngt-1.28. https://www.aclweb.org/anthology/2020.ngt-1.28.

  • Mota, C., Baptista, J., & Barreiro, A. (2019). The lexicon-grammar of predicate nouns with ser de in Port4NooJ. In I. M. Mirto, M. Monteleone, & M. Silberztein (Eds.), Formalizing natural languages with NooJ 2018 and its natural language processing applications (pp. 124-137). Springer. https://doi.org/10.1007/978-3-030-10868-7_12.

  • Mota, C., Barreiro, A., Raposo, F., Ribeiro, R., Curto, S., & Coheur, L. (2016). eSPERTo’s paraphrastic knowledge applied to question-answering and summarization. In L. Barone, M. Monteleone, & M. Silberztein (Eds.), Automatic processing of natural-language electronic texts with NooJ: 10th International Conference (NooJ 2016), České Budějovice, Czech Republic, 9–11 June 2016 (pp. 208–220). Revised Selected Papers. Springer.

  • Mota, C., Carvalho, P., Raposo, F., & Barreiro, A. (2015). Generating paraphrases of human intransitive adjective constructions with Port4NooJ. In T. Okrut, Y. Hetsevich, M. Silberztein, & H. Stanislavenka (Eds.), Automatic processing of natural language electronic texts with NooJ—Selected papers of the 9th international conference (pp. 107–122). Communications in Computer and Information Science. Springer.

    Google Scholar 

  • Mota, C., Chacoto, L., & Barreiro, A. (2018). Integrating the lexicon-grammar of predicate nouns with support verb fazer into Port4NooJ. In S. Mbarki, M. Mourchid & M. Silberztein (Eds.), Formalizing natural languages with NooJ and its natural language processing applications (pp. 29–39). Springer.

    Google Scholar 

  • Paşca, M., & Dienes, P. (2005). Aligning needles in a haystack: Paraphrase acquisition across the web. In Second international joint conference on natural language processing: Full papers. https://doi.org/10.1007/11562214_11. https://www.aclweb.org/anthology/I05-1011.

  • Pershina, M., He, Y., & Grishman, R. (2015). Idiom paraphrases: Seventh heaven vs cloud nine. In Proceedings of the first workshop on linking computational models of lexical, sentential and discourse-level semantics (pp. 76–82). Association for Computational Linguistics. https://doi.org/10.18653/v1/W15-2709. https://www.aclweb.org/anthology/W15-2709.

  • Ranchhod, E. (1983). On the support verbs ser and estar in portuguese. Lingvisticae Investigationes, 7(2), 317–353.

    Article  Google Scholar 

  • Ranchhod, E. (1990). Sintaxe dos predicados nominais com estar. Linguística, INIC.

    Google Scholar 

  • Rassi, A., Mamede, N., Baptista, J., & OV, I. (2015). Integrating support verb constructions into a parser. In: Proceedings of the Symposium in Information and Human Language Technology (STIL’2015), pp. 57–62

  • Rassi, A., Santos-Turati, C., Baptista, J., Mamede, N., & Vale, O. (2014). The fuzzy boundaries of operator verb and support verb constructions with dar “give” and ter “have” in Brazilian Portuguese. In Proceedings of the workshop on lexical and grammatical resources for language processing (LG-LP 2014), COLING 2014 (pp. 92–101). Springer.

  • Rassi, A. P. (2015). Descrição, classificação e processamento automático das construções com o verbo dar em português brasileiro. PhD thesis, Universidade Federal de São Carlos, São Carlos-SP.

  • Rassi, A. P., Barros, C. D., & Santos-Turati, M. C. A. (2012). Correlações sintático-semânticas entre as construções com os verbos-suporte ’dar’, ‘ter’ e ‘fazer’ (pp. 193–206). Dialogar é preciso: Linguística para o processamento de línguas.

  • Rassi, A. P., Barros, C. D., & Santos-Turati, M. C. A. (2013). Tipologia sintática das construções com os verbos-suporte dar, fazer e ter. In Proceedings of III workshop on Portuguese description (pp. 36–43), Fortaleza, Ceará.

  • Rebelo-Arnold, I., Barreiro, A., & Quaresma, P. (2018). EP–BP paraphrastic alignments of verbal constructions involving the clitic pronoun lhe. In A. Barreiro, J. Baptista, P. Quaresma, & R. Vieira (Eds.), Proceedings of the first workshop on linguistic tools and resources for paraphrasing in Portuguese (POP) (PROPOR 2018). Springer.

  • Salkoff, M. (1990). Automatic translation of support verb constructions. In Proceedings of the 13th conference on computational linguistics (COLING ’90) (Vol. 3, , pp. 243–246). ACL.

  • Salkoff, M. (1999). A French-English grammar: A contrastive grammar on translational principles. Linguisticae investigationes. John Benjamins.

    Book  Google Scholar 

  • Santos, D. (2014). Como estudar variantes do português e, ao mesmo tempo, construir um português internacional? Presentation at Contact, Variation and Change: Corpora development and analysis of Iberoromance language varieties workhop. http://www.linguateca.pt/Diana/download/VariantesPIGSCP.pdf.

  • Santos, C. (2015a). Construções com verbo-suporte ter no português do brasilrasil. PhD thesis, Universidade Federal de São Carlos, São Carlos-SP.

  • Santos, D. (2015b). Portuguese language identity in the world: adventures and misadventures of an international language. In E. Khachaturyan (Ed.), Language–Nation–Identity: The questione della lingua in an Italian and non-Italian context (pp. 31–54). Cambridge Scholars Publishing.

    Google Scholar 

  • Santos, D., & Inácio, S. (2006). Annotating COMPARA, a grammar-aware parallel corpus. In N. Calzolari, K. Choukri, A. Gangemi, B. Maegaard, J. Mariani, J. Odjik, & D. Tapias (Eds.), Proceedings of the 5th international conference on language resources and evaluation (LREC 2006) (pp. 1216–1221).

  • Scott, B. (2003). The logos model: An historical perspective. Machine Translation, 18(1), 1–72.

    Article  Google Scholar 

  • Scott, B. (2018). Translation, brains and the computer: A neurolinguistic solution to ambiguity and complexity in machine translation. machine translation: technologies and applications. Springer.

    Google Scholar 

  • Shinyama, Y., & Sekine, S. (2003). Paraphrase acquisition for information extraction. In Proceedings of the second international workshop on paraphrasing (pp. 65–71). Association for Computational Linguistics. https://doi.org/10.3115/1118984.1118993. https://www.aclweb.org/anthology/W03-1609.

  • Silberztein, M. (1993). Les groupes nominaux productifs et les noms composés lexicalisés. Lingvisticæ Investigationes, 17(2), 405–425.

    Article  Google Scholar 

  • Silberztein, M. (2015). La formalisation des langues: l’approche de NooJ.. ISTE.

  • Silberztein, M. (2016). Formalizing natural languages: The NooJ approach. Wiley.

  • Souza, M., & Sanches, L. M. P. (2019). Detecção de paráfrases na lıngua portuguesa usando sentence embeddings. Linguamática, 10(2), 31–44. https://doi.org/10.21814/lm.10.2.286.

  • Vietri, S. (2004). Lessico-grammatica dell’italiano: metodi, descrizioni, applicazioni. PhD thesis, UTET.

  • Vietri, S. (2010). The formalization of Italian lexicon-grammar tables in a nooj pair dictionary/grammar. In J. Kuti, M. Silberztein, & T. Váradi (Eds.), Applications of finite-state language processing: Selected papers from the NooJ 2008 International conference (pp. 138–147). Cambridge Scholars Publishing.

Download references

Acknowledgements

This research work was supported by Fundação para a Ciência e Tecnologia (FCT), under projects EXPL/MHC-LIN/2260/2013, UIDB/50021/2020,, and UTAP EXPL/EEI-ESS/0031/2014, and post-doctoral grant SFRH/BPD/91446/2012. The authors would like to thank Max Silberztein for his continued support with NooJ since the development of the first version of Port4NooJ.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Mota.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barreiro, A., Mota, C., Baptista, J. et al. Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach. Lang Resources & Evaluation 56, 1–35 (2022). https://doi.org/10.1007/s10579-021-09561-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-021-09561-5

Keywords

Navigation