Skip to main content

A Machine Learning Approach to Portuguese Clause Identification

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2010)

Abstract

In this work, we apply and evaluate a machine-learning-based system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sintá(c)tica Project – an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a F β= 1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains F β= 1= 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.

This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author was supported by a CNPq doctoral fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sang, E.F.T.K., Déjean, H.: Introduction to the CoNLL 2001 shared task: Clause identification. In: Proceedings of Fifth Conference on Computational Natural Language Learning, Toulouse, France (2001)

    Google Scholar 

  2. Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008: HLT, pp. 647–655. Association for Computational Linguistics, Columbus (2008)

    Google Scholar 

  3. Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis, Aarhus University, Aarhus, Denmark. Aarhus University Press (November 2000)

    Google Scholar 

  4. Leffa, V.J.: Clause processing in complex sentences. In: Proceedings of the First International Conference on Language Resources and Evaluation, Granada, Espanha, vol. 2, pp. 937–943 (1998)

    Google Scholar 

  5. Carreras, X., Màrquez, L.: Boosting trees for clause splitting. In: Proceedings of Fifth Conference on Computational Natural Language Learning, Toulouse, France (2001)

    Google Scholar 

  6. Fernandes, E.R., Pires, B.A., dos Santos, C.N., Milidiú, R.L.: Clause identification using entropy guided transformation learning. In: Proceedings of the 7th Brazilian Symposium in Information and Human Language Technology (STIL 2009), São Carlos, Brazil (2009)

    Google Scholar 

  7. Carreras, X., Màrquez, L., Castro, J.: Filtering-ranking perceptron learning for partial parsing. Machine Learning 60(1–3), 41–71 (2005)

    Article  Google Scholar 

  8. dos Santos, C.N., Milidiú, R.L.: Entropy Guided Transformation Learning. In: Foundations of Computational Intelligence, vol. 1 of Learning and Approximation. vol. 201 of Studies in Computational Intelligence, pp. 159–184. Springer, Heidelberg (2009)

    Google Scholar 

  9. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  10. Freitas, C., Rocha, P., Bick, E.: Floresta Sintá(c)tica: Bigger, thicker and easier. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 216–219. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. dos Santos, C.N., Milidiú, R.L., Renteria, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 143–152. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Abney, S.: Parsing by Chunks. In: Principle-Based Parsing. Kluwer Academic Publishers, Dordrecht (1991)

    Google Scholar 

  13. Freitas, M.C., Garrao, M., Oliveira, C., Santos, C.N.d., Silveira, M.: A anotação de um corpus para o aprendizado supervisionado de um modelo de sn. In: Proceedings of the III TIL / XXV Congresso da SBC, São Leopoldo - RS - Brasil (2005)

    Google Scholar 

  14. Sang, E.F.T.K.: Text chunking by system combination. In: Proceedings of Conference on Computational Natural Language Learning, Lisbon, Portugal (2000)

    Google Scholar 

  15. Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Portuguese corpus-based learning using ETL. Journal of the Brazilian Computer Society 14(4) (2008)

    Google Scholar 

  16. Milidiú, R.L., dos Santos, C.N., Crestana, C.E.M.: A token classification approach to dependency parsing. In: Proceedings of the 7th Brazilian Symposium in Information and Human Language Technology (STIL 2009), São Carlos, Brazil (2009)

    Google Scholar 

  17. Fernandes, E.R., dos Santos, C.N., Milidiú, R.L.: Portuguese language processing service. In: Proceedings of the Web in Ibero-America Alternate Track of the 18th World Wide Web Conference, Madrid (2009)

    Google Scholar 

  18. Carreras, X., Màrquez, L., Punyakanok, V., Roth, D.: Learning and inference for clause identification. In: Proceedings of the Thirteenth European Conference on Machine Learning, pp. 35–47 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fernandes, E.R., dos Santos, C.N., Milidiú, R.L. (2010). A Machine Learning Approach to Portuguese Clause Identification. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds) Computational Processing of the Portuguese Language. PROPOR 2010. Lecture Notes in Computer Science(), vol 6001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12320-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12320-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12319-1

  • Online ISBN: 978-3-642-12320-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics