Skip to main content

Effective Retrieval Model for Entity with Multi-valued Attributes: BM25MF and Beyond

  • Conference paper
Knowledge Engineering and Knowledge Management (EKAW 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7603))

Abstract

The task of entity retrieval becomes increasingly prevalent as more and more structured information about entities is available on the Web in various forms such as documents embedding metadata (RDF, RDFa, Microdata, Microformats). International benchmarking campaigns, e.g., the Text REtrieval Conference or the Semantic Search Challenge, propose entity-oriented search tracks. This reflects the need for an effective search and discovery of entities. In this work, we present a multi-valued attributes model for entity retrieval which extends and generalises existing field-based ranking models. Our model introduces the concept of multi-valued attributes and enables attribute and value-specific normalization and weighting. Based on this model we extend two state-of-the-art field-based rankings, i.e., BM25F and PL2F, and demonstrate based on evaluations over heterogeneous datasets that this model improves significantly the retrieval performance compared to existing models. Finally, we introduce query dependent and independent weights specifically designed for our model which provide significant performance improvement.

Preliminary results of the approach was presented in a technical report at SemSearch 2011 — http://semsearch.yahoo.com/9-Sindice.pdf . We have extended it with (1) an extension of the PL2F ranking function; (2) a study of optimised normalisation parameters, and (3) a comparison against two other field-based approaches over additional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cafarella, M.J., Halevy, A., Madhavan, J.: Structured Data on the Web. Communications of the ACM 54(2), 72 (2011)

    Article  Google Scholar 

  2. Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2010 Entity Track. In: Proceedings of the Nineteenth Text REtrieval Conference (TREC 2010), NIST (2011)

    Google Scholar 

  3. Demartini, G., Iofciu, T., de Vries, A.P.: Overview of the INEX 2009 Entity Ranking Track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 254–264. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Tran, T., Mika, P., Wang, H., Grobelnik, M.: Semsearch’11: the 4th semantic search workshop. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India (Companion Volume), March 28-April 1, pp. 315–316. ACM (2011)

    Google Scholar 

  5. Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran, D.T.: Entity search evaluation over structured web data. In: Proceedings of the 1st International Workshop on Entity-Oriented Search at SIGIR 2011, Beijing, PR China (Juli 2011)

    Google Scholar 

  6. Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 771–780. ACM Press, New York (2010)

    Chapter  Google Scholar 

  7. Zaragoza, H., Craswell, N., Taylor, M.J., Saria, S., Robertson, S.E.: Microsoft Cambridge at TREC 13: Web and Hard Tracks. In: TREC 2004, p. 1–1 (2004)

    Google Scholar 

  8. Macdonald, C., Plachouras, V., He, B., Lioma, C., Ounis, I.: University of Glasgow at WebCLEF 2005: Experiments in Per-Field Normalisation and Language Specific Stemming. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 898–907. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3, 333–389 (2009)

    Article  Google Scholar 

  10. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)

    Article  Google Scholar 

  11. Abiteboul, S.: Querying Semi-Structured Data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 1–18. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  12. Klyne, G., Carroll, J.J.: Resource Description Framework (RDF): Concepts and Abstract Syntax. Changes 10, 1–20 (2004)

    Google Scholar 

  13. Delbru, R., Campinas, S., Tummarello, G.: Searching Web Data: an Entity Retrieval and High-Performance Indexing Model. Web Semantics: Science, Services and Agents on the World Wide Web 10(0) (2012)

    Google Scholar 

  14. Pérez-Agüera, J.R., Arroyo, J., Greenberg, J., Iglesias, J.P., Fresno, V.: Using BM25F for semantic search. In: Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH 2010, pp. 2:1–2:8. ACM, New York (2010)

    Google Scholar 

  15. Blanco, R., Mika, P., Vigna, S.: Effective and Efficient Entity Search in RDF Data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 83–97. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Harter, S.: A probabilistic approach to automatic keyword indexing. PhD thesis, The University of Chicago (1974)

    Google Scholar 

  17. Robertson, S.E., van Rijsbergen, C.J., Porter, M.F.: Probabilistic models of indexing and searching. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 35–56. Butterworth & Co, Kent (1981)

    Google Scholar 

  18. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, pp. 42–49. ACM, New York (2004)

    Chapter  Google Scholar 

  19. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 232–241. Springer-Verlag New York, Inc., New York (1994)

    Google Scholar 

  20. Hu, X., Eberhart, R.: Solving Constrained Nonlinear Optimization Problems with Particle Swarm Optimization. In: 6th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2002), pp. 203–206 (2002)

    Google Scholar 

  21. Sheskin, D.J., Hall, C.: Handbook of Parametric and Nonparametric Statistical Procedures, 3rd edn. CRC (2003)

    Google Scholar 

  22. Büttcher, S., Clarke, C., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press (2010)

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Campinas, S., Delbru, R., Tummarello, G. (2012). Effective Retrieval Model for Entity with Multi-valued Attributes: BM25MF and Beyond. In: ten Teije, A., et al. Knowledge Engineering and Knowledge Management. EKAW 2012. Lecture Notes in Computer Science(), vol 7603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33876-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33876-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33875-5

  • Online ISBN: 978-3-642-33876-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics