Skip to main content

Named Entity Recognition in Portuguese Neurology Text Using CRF

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11804))

Included in the following conference series:

Abstract

Automatic recognition of named entities from clinical text lightens the work of health professionals by helping in the interpretation and easing tasks such as the population of databases with patient health information. In this study, we evaluated the performance of Conditional Random Fields, a sequence labelling model, for extracting entities from neurology clinical texts written in Portuguese. More than achieving F1-scores of about 73% or 80%, respectively for a relaxed or strict evaluation, the more discriminant features in this task were also analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Distributional semantic models, or word embeddings, are typically learned from large collections of text and represent words by vectors of numbers, based on their distribution in text. This enables positioning words in a hyperplane and makes several processing tasks easier, such as computing semantic similarity with the cosine of the word vectors.

  2. 2.

    http://www.sinapse.pt/archive.php.

  3. 3.

    https://sklearn-crfsuite.readthedocs.io/en/latest/index.html.

References

  1. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  2. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp. 31–40 (2009)

    Google Scholar 

  3. Ferreira, L., Teixeira, A.J.S., Cunha, J.P.: Information extraction from Portuguese hospital discharge letters. In: VI Jornadas en Technologia del Habla and II Iberian SL Tech Workshop, pp. 39–42, January 2010

    Google Scholar 

  4. Ferreira, L.d.S.: Medical information extraction in European Portuguese. Ph.D. thesis, Universidade de Aveiro (2011)

    Google Scholar 

  5. Gold, S., Elhadad, N., Zhu, X., Cimino, J.J., Hripcsak, G.: Extracting structured medication event information from discharge summaries. In: AMIA Annual Symposium Proceedings, vol. 2008, pp. 237–241. American Medical Informatics Association (2008)

    Google Scholar 

  6. Henriksson, A., Dalianis, H., Kowalski, S.: Generating features for named entity recognition by learning prototypes in semantic space: the case of de-identifying health records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457. IEEE (2014)

    Google Scholar 

  7. Klinger, R., Tomanek, K.: Classical probabilistic models and conditional random fields. Technical report TR07-2-013, Department of Computer Science, Dortmund University of Technology (2007). https://ls11-www.cs.uni-dortmund.de/_media/techreports/tr07-13.pdf

  8. Lamy, M., Pereira, R., Ferreira, J.C., Vasconcelos, J.B., Melo, F., Velez, I.: Extracting clinical information from electronic medical records. In: Novais, P., et al. (eds.) ISAmI2018 2018. AISC, vol. 806, pp. 113–120. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01746-0_13

    Chapter  Google Scholar 

  9. Mykowiecka, A., Marciniak, M., Kupść, A.: Rule-based information extraction from patients clinical data. J. Biomed. Inform. 42(5), 923–936 (2009)

    Article  Google Scholar 

  10. Névéol, A., Dalianis, H., Velupillai, S., Savova, G., Zweigenbaum, P.: Clinical natural language processing in languages other than English: opportunities and challenges. J. Biomed. Seman. 9(1), 12 (2018)

    Article  Google Scholar 

  11. Rais, M., Lachkar, A., Lachkar, A., Ouatik, S.E.A.: A comparative study of biomedical named entity recognition methods based machine learning approach. In: 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), pp. 329–334. IEEE (2014)

    Google Scholar 

  12. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta (2010)

    Google Scholar 

  13. Rodrigues, R., Oliveira, H.G., Gomes, P.: NLPPort: a pipeline for Portuguese NLP (Short paper). In: 7th Symposium on Languages, Applications and Technologies (SLATE 2018). OpenAccess Series in Informatics (OASIcs), vol. 62, pp. 18:1–18:9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018). https://doi.org/10.4230/OASIcs.SLATE.2018.18

  14. Russell, S.J., Norvig, P.: Probabilistic reasoning over time. In: Limited, P.E. (ed.) Artificial Intelligence: A Modern Approach, Chap. 15, pp. 566–636, 3rd edn. Pearson, London (2010)

    Google Scholar 

  15. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 134–141. Association for Computational Linguistics (2003)

    Google Scholar 

  16. Sinapse: Publicações da Sociedade Portuguesa de Neurologia, vol. 17:1. Sociedade Portuguesa de Neurologia, Lisbon (2017)

    Google Scholar 

  17. Sinapse: Publicações da Sociedade Portuguesa de Neurologia, vol. 17:2. Sociedade Portuguesa de Neurologia, Lisbon (2017)

    Google Scholar 

  18. Skeppstedt, M., Kvist, M., Dalianis, H.: Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. In: LREC, pp. 1250–1257 (2012)

    Google Scholar 

  19. Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study. J. Biomed. Inform. 49, 148–158 (2014)

    Article  Google Scholar 

  20. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL 2003, pp. 142–147. Association for Computational Linguistics, Stroudsburg (2003). https://doi.org/10.3115/1119176.1119195

  21. Wang, Y., et al.: Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. J. Biomed. Inform. 47, 91–104 (2014)

    Article  Google Scholar 

  22. Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H.: A study of neural word embeddings for named entity recognition in clinical text. In: AMIA Annual Symposium Proceedings, vol. 2015, pp. 1326–1333. American Medical Informatics Association (2015)

    Google Scholar 

Download references

Acknowledgements

We acknowledge the financial support of Fundação para a Ciência e a Tecnologia through CISUC (UID/CEC/00326/2019).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fábio Lopes , César Teixeira or Hugo Gonçalo Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lopes, F., Teixeira, C., Gonçalo Oliveira, H. (2019). Named Entity Recognition in Portuguese Neurology Text Using CRF. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11804. Springer, Cham. https://doi.org/10.1007/978-3-030-30241-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30241-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30240-5

  • Online ISBN: 978-3-030-30241-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics