Skip to main content

A Data-Compression Approach to the Monolingual GIRT Task: An Agnostic Point of View

  • Conference paper
Comparative Evaluation of Multilingual Information Access Systems (CLEF 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3237))

Included in the following conference series:

Abstract

In this paper we apply a data-compression IR method in the GIRT social science database, focusing on the monolingual task in German and English. For this purpose we use a recently proposed general scheme for context recognition and context classification of strings of characters (in particular texts) or other coded information. The key point of the method is the computation of a suitable measure of remoteness (or similarity) between two strings of characters. This measure of remoteness reflects the distance between the structures present in the two strings, i.e. between the two different distributions of elements of the compared sequences. The hypothesis is that the information-theory oriented measure of remoteness between two sequences could reflect their semantic distance. It is worth stressing the generality and versatility of our information-theoretic method which applies to any kind of corpora of character strings, whatever the type of coding used (i.e. language).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    Google Scholar 

  2. Croft, B. (ed.): Advances in Information Retrieval – Recent Research from the Centre for Intelligent Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  3. Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)

    Google Scholar 

  4. Zurek, W.H. (ed.): Complexity, Entropy and Physics of Information. Addison-Wesley, Redwood City (1990)

    Google Scholar 

  5. Li, M., Vitànyi, P.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  6. Khinchin, A.I.: Mathematical Foundations of Information Theory. Dover, New York (1957)

    MATH  Google Scholar 

  7. Benedetto, D., Caglioti, E., Loreto, V.: Language Trees and Zipping. Physical Review Letters 88, 048702–048705 (2002)

    Google Scholar 

  8. Ziv, J., Merhav, N.: A Measure of Relative Entropy between Individual Sequences with Applications to Universal Classification. IEEE Transactions on Information Theory 39, 1280–1292 (1993)

    Article  MathSciNet  Google Scholar 

  9. Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23, 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  10. Puglisi, A., Benedetto, D., Caglioti, E., Loreto, V., Vulpiani, A.: Data Compression and Learning Time Sequences Analysis. Physica D 180, 92–107 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  11. Benedetto, D., Caglioti, E., Loreto, V.: Zipping Out Relevant Information. Invited column “Computing Prescriptions”. The AIP/IEEE journal Computing in Science and Engineering, January-February issue (2003)

    Google Scholar 

  12. Braschler, M., Ripplinger, B.: Stemming and Decompounding for German Text Retrieval. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 177–192. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Kluck, M., Gey, F.C.: The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, p. 48. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alderuccio, D., Bordoni, L., Loreto, V. (2004). A Data-Compression Approach to the Monolingual GIRT Task: An Agnostic Point of View. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30222-3_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24017-4

  • Online ISBN: 978-3-540-30222-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics