Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

Abstract

The paper presents a method of extracting terminology from Polish texts which consists of two steps. The first one identifies candidates for terms, and is supported by linguistic knowledge-a shallow grammar used for extracted phrases is given. The second step is based on statistics, consisting in ranking and filtering candidates for domain terms with the help of a C-value method, and phrases extracted from general Polish texts. The presented approach is sensitive to finding terminology also expressed as subphrases. We applied the method to economics texts, and describe the results of the experiment. The paper closes with an evaluation and a discussion of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AcedaƄski, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., HelgadĂłttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. BarrĂłn-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An Improved Automatic Term Recognition Method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Broda, B., Derwojedowa, M., Piasecki, M.: Recognition of structured collocations in an inflective language. System Science (4) (2008)

    Google Scholar 

  4. Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. Journal on Digital Libraries 3, 115–130 (2000)

    Article  Google Scholar 

  5. Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and Evaluating Automatic Term Recognition Techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Marciniak, M., Mykowiecka, A.: Towards morphologically annotated corpus of hospital discharge reports in Polish. In: Proc. of the BioNLP, ACL/HLT 2011 Workshop, Portland, Oregon (2011)

    Google Scholar 

  7. Marciniak, M., Savary, A., Sikora, P., WoliƄski, M.: ToposƂaw – A Lexicographic Framework for Multi-word Units. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 139–150. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Mykowiecka, A., Marciniak, M.: Terminology extraction from medical texts in Polish. In: Ananiadou, S., Pyysalo, S., Rebholz-Schuhmann, D., Rinaldi, F., Salakoski, T. (eds.) Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine, SMBM 2012 (2012)

    Google Scholar 

  9. Pazienza, M.T., Marco Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In: Sirmakessis, S. (ed.) Knowledge Mining. STUDFUZZ, vol. 185, pp. 255–279. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly 11(1-2), 151–167 (2007)

    Google Scholar 

  11. Piasecki, M., Radziszewski, A.: Polish Morphological Guesser Based on a Statistical A Tergo Index. In: Proceedings of the International Multiconference on Computer Science and Information Technology — 2nd International Symposium Advances in Artificial Intelligence and Applications (AAIA 2007), pp. 247–256 (2007)

    Google Scholar 

  12. Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warsaw (2008)

    Google Scholar 

  13. PrzepiĂłrkowski, A., Bañko, M., GĂłrski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)

    Google Scholar 

  14. Savova, G.K., Harris, M., Johnson, T., Pakhomov, S.V., Chute, C.G.: A data-driven approach for extracting “the most specific term” for ontology development. In: Proc. of AMIA (2003)

    Google Scholar 

  15. Sinclair, J. (ed.): Collins Cobuid English Language Dictionary. Collins Publ. (1990)

    Google Scholar 

  16. Wermter, J., Hahn, U.: Massive Biomedical Term Discovery. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 281–293. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. WoliƄski, M.: Morfeusz — a Practical Tool for the Morphological Analysis of Polish. In: KƂopotek, M., WierzchoƄ, S., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining, IIS: IIPWM 2006 Proceedings, pp. 503–512. Springer, Heidelberg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to MaƂgorzata Marciniak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Marciniak, M., Mykowiecka, A. (2013). Terminology Extraction from Domain Texts in Polish. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35647-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35646-9

  • Online ISBN: 978-3-642-35647-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics