Skip to main content

Yet Another Suite of Multilingual NLP Tools

  • Conference paper
  • First Online:
Languages, Applications and Technologies (SLATE 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 563))

Included in the following conference series:

Abstract

This paper presents the current development of a multilingual suite for Natural Language Processing. It consists of a sentence chunker, a tokenizer, a PoS-tagger, a dictionary-based lemmatizer and a Named Entity Recognizer (both for enamex and numex expressions). The architecture of the pipeline and the main resources used for its development are described. Besides, the PoS-tagger and the Named Entity Recognizer are evaluated against several state-of-the-art systems. The experiments performed in Portuguese and English show that, in spite of its simplicity, our system competes with some well known tools for NLP. It is entirely written in Perl and distributed under a GPL license.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://proxectos.citius.usc.es/hpcpln/index.php.

  2. 2.

    http://opennlp.apache.org/.

  3. 3.

    http://www.freebase.com.

  4. 4.

    http://www.dbpedia.org.

  5. 5.

    http://www.wikipedia.org.

  6. 6.

    http://www.linguateca.pt/floresta/CoNLL-X/.

  7. 7.

    ftp://ftp.cis.upenn.edu/pub/xtag/morph-1.5/morph-1.5.tar.gz.

  8. 8.

    http://clu.uni.no/icame/brown/bcm.html.

  9. 9.

    http://www.itl.nist.gov/iad/894.01/tests/ie-er/er_99/er_99.htm.

  10. 10.

    http://www.gabormelli.com/RKB/SemCor_Corpus.

  11. 11.

    The output of each system as well as the gold-standard files can be obtained in the following url: http://gramatica.usc.es/~marcos/slate15.zip.

  12. 12.

    http://opennlp.sourceforge.net/models/english/namefind/.

  13. 13.

    http://nlp.stanford.edu/software/conll.distsim.iob2.crf.ser.gz.

References

  1. Agerri, R., Bermudez, J., Rigau, G.: IXA pipeline: efficient and ready to use multilingual NLP tools. In: Proceedings of the 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavik (2014)

    Google Scholar 

  2. Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP). Association for Computational Linguistics (2000)

    Google Scholar 

  3. Carreras, X., Màrquez, Ll., Padró, Ll.: A simple named entity extractor using adaboost. In: Proceedings of the Conference on Natural Language Learning (CoNLL 2003) Shared Task. Edmonton (2003)

    Google Scholar 

  4. Eleutério, S., Ranchhod, E., Mota, C., Carvalho, P.: Dicionários electrónicos do Português. Características e Aplicações. In: Actas del VIII Simposio Internacional de Comunicación Social, pp. 636–642, Santiago de Cuba (2003)

    Google Scholar 

  5. Finkel, J., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363–370 (2005)

    Google Scholar 

  6. Gamallo, P., Pichel, J.C., Garcia, M., Abuín, J.M., Pena, T.F.: Análisis morfosintáctico y clasificación de entidades nombradas en un entorno big data. Procesamiento Lenguaje Nat. 53, 17–24 (2014)

    Google Scholar 

  7. Garcia, M.: Extracção de Relações Semânticas. Recursos, Ferramentas e Estratégias. Ph.D. thesis, University of Santiago de Compostela (2014)

    Google Scholar 

  8. Garcia, M., Gamallo, P.: Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação. LinguaMÁTICA 2(2), 59–67 (2010)

    Google Scholar 

  9. Garcia, M., Gamallo, P.: Multilingual corpora with coreferential annotation of person entities. In: Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), pp. 3229–3233, Reykjavik (2014)

    Google Scholar 

  10. Leach, G., Wilson, A.: Recommendations for the morphosyntactic annotation of corpora. Expert Advisory Group on Language Engineering Standard, Techincal report, EAGLES (1996)

    Google Scholar 

  11. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014): System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  12. Padró, L.I.: Analizadores multilingües en freeling. LinguaMÁTICA 3(2), 13–20 (2011)

    Google Scholar 

  13. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech tagging with a cyclic dependency network. In: Proceedings of the Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), pp. 252–259, Edmonton (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Garcia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Garcia, M., Gamallo, P. (2015). Yet Another Suite of Multilingual NLP Tools. In: Sierra-Rodríguez, JL., Leal, JP., Simões, A. (eds) Languages, Applications and Technologies. SLATE 2015. Communications in Computer and Information Science, vol 563. Springer, Cham. https://doi.org/10.1007/978-3-319-27653-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27653-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27652-6

  • Online ISBN: 978-3-319-27653-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics