Skip to main content

Multilingual Text Classification Using Ontologies

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

Abstract

In this paper, we investigate strategies for automatically classifying documents in different languages thematically, geographically or according to other criteria. A novel linguistically motivated text representation scheme is presented that can be used with machine learning algorithms in order to learn classifications from pre-classified examples and then automatically classify documents that might be provided in entirely different languages. Our approach makes use of ontologies and lexical resources but goes beyond a simple mapping from terms to concepts by fully exploiting the external knowledge manifested in such resources and mapping to entire regions of concepts. For this, a graph traversal algorithm is used to explore related concepts that might be relevant. Extensive testing has shown that our methods lead to significant improvements compared to existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bel, N., Koster, C.H.A., Villegas, M.: Cross-lingual text categorization. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 126–139. Springer, Heidelberg (2003)

    Google Scholar 

  2. García Adeva, J.J., Calvo, R.A., de Ipiña, D.L.: Multilingual approaches to text categorisation. Europ. J. for the Informatics Professional VI(3), 43–51 (2005)

    Google Scholar 

  3. Jalam, R.: Apprentissage automatique et catégorisation de textes multilingues. PhD thesis, Université Lumière Lyon 2, Lyon, France (2003)

    Google Scholar 

  4. Olsson, J.S., et al.: Cross-language text classification. In: Proc. SIGIR 2005, pp. 645–646 (2005), doi:10.1145/1076034.1076170

    Google Scholar 

  5. Rigutini, L., et al.: An EM based training algorithm for cross-language text categorization. In: Proc. Web Intelligence 2005, Washington, DC, USA, pp. 529–535 (2005)

    Google Scholar 

  6. Oard, D.W., Dorr, B.J.: A survey of multilingual text retrieval. Technical report, University of Maryland at College Park, College Park, MD, USA (1996)

    Google Scholar 

  7. de Buenaga Rodríguez, M., et al.: Using WordNet to complement training information in text categorization. In: Proc. 2nd RANLP (1997)

    Google Scholar 

  8. Moschitti, A., Basili, R.: Complex linguistic features for text classification: a comprehensive study. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, Springer, Heidelberg (2004)

    Google Scholar 

  9. Ifrim, G., Theobald, M., Weikum, G.: Learning word-to-concept mappings for automatic text classification. In: Proc. 22nd ICML - LWS, pp. 18–26 (2005)

    Google Scholar 

  10. Verdejo, F., Gonzalo, J., Peñas, A., et al.: Evaluating wordnets in cross-language text retrieval. In: Proceedings LREC (2000)

    Google Scholar 

  11. Scott, S., Matwin, S.: Text classification using WordNet hypernyms. In: Proc. Worksh. Usage of WordNet in NLP Systems at COLING-98, pp. 38–44. Sage, Thousand Oaks (1998)

    Google Scholar 

  12. Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. Worksh. on Mining for/from the Semantic Web at KDD 2004, pp. 70–87 (2004)

    Google Scholar 

  13. Ramakrishnanan, G., et al.: Text representation with WordNet synsets using soft sense disambiguation. Ing. systèmes d’information 8(3), 55–70 (2003)

    Article  Google Scholar 

  14. Gliozzo, A.M., et al.: Cross language text categ. by acq. multil. domain models from comp. corpora. In: Proc. ACL Worksh. Building and Using Parallel Texts (2005), http://tcc.itc.it/people/gliozzo/Papers/Gliozzo-ACL-2005b.pdf

  15. Dumais, S.T., et al.: Automatic cross-language retrieval using latent semantic indexing. In: AAAI Symposium on CrossLanguage Text and Speech Retrieval (1997), http://lsi.research.telcordia.com/lsi/papers/XLANG96.pdf

  16. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  17. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002), http://www.math.unipd.it/~fabseb60/Publications/ACMCS02.pdf

    Article  Google Scholar 

  18. AltaVista: Babel fish translation (2006), http://babelfish.altavista.com/

  19. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press, Cambridge (1998)

    Google Scholar 

  20. Farreres, X., Rigau, G., Rodríguez, H.: Using WordNet for building WordNets. In: Proc. Conf. Use of WordNet in NLP Systems, pp. 65–72 (1998)

    Google Scholar 

  21. Theobald, M., Schenkel, R., Weikum, G.: Exploiting structure, annotation, and ontological knowledge for automatic classification of XML data. In: 6th Intl. Worksh. Web and Databases, pp. 1–6 (2003)

    Google Scholar 

  22. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (1995)

    MATH  Google Scholar 

  23. Joachims, T.: Making large-scale support vector machine learning practical. Advances in Kernel Methods: Support Vector Machines (1999), http://www.joachims.org/publications/joachims_99a.pdf

  24. Daudé, J., et al.: Making Wordnet mappings robust. In: Proc. Congreso de la Sociedad Española para el Procesamiento del Lenguage Natural (SEPLN) (2003)

    Google Scholar 

  25. Wikimedia Foundation: Wikipedia (2006), http://www.wikipedia.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

de Melo, G., Siersdorfer, S. (2007). Multilingual Text Classification Using Ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics