Skip to main content

Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9309))

Included in the following conference series:

  • International Symposium on String Processing and Information Retrieval
  • 1116 Accesses

Abstract

This paper describes and evaluates the use of Geographical Knowledge Re-Ranking, Linguistic Processing, and Query Expansion techniques to improve Geographical Information Retrieval effectiveness. Geographical Knowledge Re-Ranking is performed with Geographical Gazetteers and conservative Toponym Disambiguation techniques that boost the ranking of the geographically relevant documents retrieved by standard state-of-the-art Information Retrieval algorithms. Linguistic Processing is performed in two ways: 1) Part-of-Speech tagging and Named Entity Recognition and Classification are applied to analyze the text collections and topics to detect toponyms, 2) Stemming (Porter’s algorithm) and Lemmatization are also applied in combination with default stopwords filtering. The Query Expansion methods tested are the Bose-Einstein (Bo1) and Kullback-Leibler term weighting models. The experiments have been performed with the English Monolingual test collections of the GeoCLEF evaluations (from years 2005, 2006, 2007, and 2008) using the TF-IDF, BM25, and InL2 Information Retrieval algorithms over unprocessed texts as baselines. The experiments have been performed with each GeoCLEF test collection (25 topics per evaluation) separately and with the fusion of all these collections (100 topics). The results of evaluating separately Geographical Knowledge Re-Ranking, Linguistic Processing (lemmatization, stemming, and the combination of both), and Query Expansion with the fusion of all the topics show that all these processes improve the Mean Average Precision (MAP) and RPrecision effectiveness measures in all the experiments and show statistical significance over the baselines in most of them. The best results in MAP and RPrecision are obtained with the InL2 algorithm using the following techniques: Geographical Knowledge Re-Ranking, Lemmatization with Stemming, and Kullback-Leibler Query Expansion. Some configurations with Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion have improved the MAP of the best official results at GeoCLEF evaluations of 2005, 2006, and 2007.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G.: Probability Models for Information Retrieval Based on Divergence From Randomness. Ph.D. thesis, University of Glasgow (2003)

    Google Scholar 

  2. Brants, T.: TnT: A Statistical Part-of-speech Tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC2000, pp. 224–231. Association for Computational Linguistics, Stroudsburg (2000). http://dx.doi.org/10.3115/974147.974178

  3. Buscaldi, D., Rosso, P.: Explicit Query Diversification for Geographical Information Retrieval. In: The 33rd European Conference on Information Retrieval, ECIR 2011, Ireland, pp. 73–80. (April 2011). https://hal.archives-ouvertes.fr/hal-00596899

  4. Ferrés, D., Rodríguez, H.: TALP at GeoCLEF 2007: Results of a Geographical Knowledge Filtering Approach with Terrier. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 830–833. Springer, Heidelberg (2008)

    Google Scholar 

  5. Hill, L.L.: Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 280–290. Springer, Heidelberg (2000)

    Google Scholar 

  6. Jones, C.B., Purves, R.S.: Geographical Information Retrieval. International Journal of Geographical Information Science 22(3), 219–228 (2008). http://dx.doi.org/10.1080/13658810701626343

  7. Jones, R., Zhang, W.V., Rey, B., Jhala, P., Stipp, E.: Geographic Intention and Modification in Web Search. Int. J. Geogr. Inf. Sci. 22(3), 229–246 (2008). http://dx.doi.org/10.1080/13658810701626186

  8. Larson, R.R., Gey, F.C., Petras, V.: Berkeley at GeoCLEF: Logistic Regression and Fusion for Geographic Information Retrieval. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 963–976. Springer, Heidelberg (2006)

    Google Scholar 

  9. Mandl, T., Gey, F.C., Nunzio, G.M.D., Ferro, N., Sanderson, M., Santos, D., Womser-Hacker, C.: An Evaluation Resource for Geographic Information Retrieval. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, May 26-June 1, Marrakech, Morocco. European Language Resources Association (2008). http://www.lrec-conf.org/proceedings/lrec2008/summaries/8.html

  10. Martins, B., Calado, P.: Learning to Rank for Geographic Information Retrieval. In: Purves, R., Clough, P.D., Jones, C.B. (eds.) Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR 2010, Zurich, Switzerland, February 18–19. ACM (2010). http://doi.acm.org/10.1145/1722080.1722107

  11. Martins, B., Cardoso, N., Chaves, M.S., Andrade, L., Silva, M.J.: The University of Lisbon at GeoCLEF 2006. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 986–994. Springer, Heidelberg (2007)

    Google Scholar 

  12. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Information Retrieval (OSIR 2006) (2006)

    Google Scholar 

  13. Perea-Ortega, J.M., García-Cumbreras, M.A., Ureña-López, L.A., García-Vega, M.: Geo-Textual Relevance Ranking to Improve a Text-Based Retrieval for Geographic Queries. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 278–281. Springer, Heidelberg (2011)

    Google Scholar 

  14. Sakai, T.: Statistical Reform in Information Retrieval? SIGIR Forum 48(1), 3–12 (2014). http://doi.acm.org/10.1145/2641383.2641385

  15. Smucker, M.D., Allan, J., Carterette, B.: A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 623–632. ACM, New York (2007). http://doi.acm.org/10.1145/1321440.1321528

  16. Wang, R., Neumann, G.: Ontology-Based Query Construction for GeoCLEF. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 880–884. Springer, Heidelberg (2009)

    Google Scholar 

  17. Wang, R., Neumann, G.: Ontology-Based Query Construction for GeoCLEF. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 880–884. Springer, Heidelberg (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Ferrés .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ferrés, D., Rodríguez, H. (2015). Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23826-5_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23825-8

  • Online ISBN: 978-3-319-23826-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics