Abstract
The paper presents a method of extracting terminology from Polish texts which consists of two steps. The first one identifies candidates for terms, and is supported by linguistic knowledge-a shallow grammar used for extracted phrases is given. The second step is based on statistics, consisting in ranking and filtering candidates for domain terms with the help of a C-value method, and phrases extracted from general Polish texts. The presented approach is sensitive to finding terminology also expressed as subphrases. We applied the method to economics texts, and describe the results of the experiment. The paper closes with an evaluation and a discussion of the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AcedaĆski, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., HelgadĂłttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3â14. Springer, Heidelberg (2010)
BarrĂłn-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An Improved Automatic Term Recognition Method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125â136. Springer, Heidelberg (2009)
Broda, B., Derwojedowa, M., Piasecki, M.: Recognition of structured collocations in an inflective language. System Science (4) (2008)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. Journal on Digital Libraries 3, 115â130 (2000)
Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and Evaluating Automatic Term Recognition Techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248â259. Springer, Heidelberg (2008)
Marciniak, M., Mykowiecka, A.: Towards morphologically annotated corpus of hospital discharge reports in Polish. In: Proc. of the BioNLP, ACL/HLT 2011 Workshop, Portland, Oregon (2011)
Marciniak, M., Savary, A., Sikora, P., WoliĆski, M.: ToposĆaw â A Lexicographic Framework for Multi-word Units. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 139â150. Springer, Heidelberg (2011)
Mykowiecka, A., Marciniak, M.: Terminology extraction from medical texts in Polish. In: Ananiadou, S., Pyysalo, S., Rebholz-Schuhmann, D., Rinaldi, F., Salakoski, T. (eds.) Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine, SMBM 2012 (2012)
Pazienza, M.T., Marco Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In: Sirmakessis, S. (ed.) Knowledge Mining. STUDFUZZ, vol. 185, pp. 255â279. Springer, Heidelberg (2005)
Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly 11(1-2), 151â167 (2007)
Piasecki, M., Radziszewski, A.: Polish Morphological Guesser Based on a Statistical A Tergo Index. In: Proceedings of the International Multiconference on Computer Science and Information Technology â 2nd International Symposium Advances in Artificial Intelligence and Applications (AAIA 2007), pp. 247â256 (2007)
PrzepiĂłrkowski, A.: Powierzchniowe przetwarzanie jÄzyka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warsaw (2008)
PrzepiĂłrkowski, A., Bañko, M., GĂłrski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus JÄzyka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)
Savova, G.K., Harris, M., Johnson, T., Pakhomov, S.V., Chute, C.G.: A data-driven approach for extracting âthe most specific termâ for ontology development. In: Proc. of AMIA (2003)
Sinclair, J. (ed.): Collins Cobuid English Language Dictionary. Collins Publ. (1990)
Wermter, J., Hahn, U.: Massive Biomedical Term Discovery. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 281â293. Springer, Heidelberg (2005)
WoliĆski, M.: Morfeusz â a Practical Tool for the Morphological Analysis of Polish. In: KĆopotek, M., WierzchoĆ, S., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining, IIS: IIPWM 2006 Proceedings, pp. 503â512. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Marciniak, M., Mykowiecka, A. (2013). Terminology Extraction from Domain Texts in Polish. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-35647-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)