Skip to main content

Comparison of Normalization Techniques for Metasearch

  • Conference paper
  • First Online:
Advances in Information Systems (ADVIS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2457))

Included in the following conference series:

Abstract

It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization technique is to target the first requirement, i.e., document scores of different retrieval outputs are brought into a common scale so that document scores can be comparable across combined retrieval outputs. This has been a recent subject of researches in metasearch and information filtering fields. In this paper, we present a different perspective on multiple evidence combination and investigate various normalization techniques, mostly ad-hoc in nature, with a special focus on the SUM, which shifts minimum scores to zero and then scales their summation to one. This formal approach is equivalent to normalize the distribution of scores of all documents in a retrieval output by dividing them by their sample mean. We have made extensive experiments using ad hoc tracks of third and fifth TREC collections and CLEF’00 database. We argue that (1) the normalization method SUM is consistently better than the other traditionally proposed ones when combining outputs of search systems operating on a single database; (2) the SUM for combination of outputs of search systems operating on mutually exclusive databases is still valuable alternative to the one weighting score distributions of documents by their databases’ size.

This material is based on work supported in general by the Center for Intelligent Information Retrieval. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor(s).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Arampatzis and A. van Hameren. Maximum likelihood estimation for filtering thresholds. In Proceedings of the 24th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 285–293, New Orleans, LA, September 2001.

    Google Scholar 

  2. N. Belkin, C. Cool, W. Croft, and J. Callan. The effect of multiple query representation on information retrieval system performance. In Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 339–346, Pittsburgh, PA, USA, 1993.

    Google Scholar 

  3. N. Belkin, P. Kantor, E. Fox, and J. Shaw. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3):431–448, 1995.

    Article  Google Scholar 

  4. J. Callan, W. Croft, and S. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert System Applications (DEXA 3), pages 78–83, Berlin, 1992. Springer-Verlag.

    Google Scholar 

  5. W. Croft and H. Turtle. A retrieval model for incorporating hypertext links. In Proceedings of ACM Hypertext Conference, pages 213–224, New Orleans, LA, November 1989.

    Google Scholar 

  6. W. Croft. Combining approaches to information retrieval. In W. Croft, editor, Advances in Information Retrieval, pages 1–36. Kluwer Academic Publishers, 2000.

    Google Scholar 

  7. C. Dwork, S. R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In World Wide Web, pages 613–622, 2001.

    Google Scholar 

  8. J. Fiscus and G. R. Doddington. Topic detection and tracking overview. In J. Allan, editor, Topic Detection and Tracking: Event-based Information Organization, pages 17–31. Kluwer Academic Publishers, 2002.

    Google Scholar 

  9. S. Gauch, G. Wang, and M. Gomez. Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 2(9):637–649, Sept. 1996. http://www.jucs.org/jucs29/profusionintelligentfusionfrom.

    Google Scholar 

  10. D. Hiemstra, W. Kraaij, R. Pohlmann, and T. Westerveld. Translation resources, merging strategies and relevance feedback for cross-language information retrieval. In C. Peters, editor, Cross-language information retrieval and evaluation, Lecture Notes in Computer Science (LNCS-2069), pages 102–115. Springer Verlag, NY, 2001.

    Chapter  Google Scholar 

  11. J. Katzer, M. J. McGill, J. Tessier, W. Frakes, and P. DasGupta. A study of the overlap among document representations. Information Technology: Research and Development, 1(4):261–274, Oct 1982.

    Google Scholar 

  12. S. Lawrence and C. Giles. Searching the World Wide Web. Science, 280(5360):98–100, April 1998.

    Article  Google Scholar 

  13. J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In E. A. Fox, editor, Proceedings of the 18th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pages 180–188, Seattle, WA, July 1995.

    Google Scholar 

  14. J. H. Lee. Analyses of multiple evidence combination. In E. A. Fox, editor, Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 267–276, Philadelphia, Pennsylvania, July 1997.

    Google Scholar 

  15. R. Manmatha and H. Sever. A formal approach to score normalization for metasearch. In Proceedings of Human Language Technology Conference, San Diego, CA, March 2002.

    Google Scholar 

  16. M. Montague and J. Aslam. Relevance score normalization for metasearch. In Proceedings of the ACM 10th Annual International Conference on Information and Knowledge Management (CIKM), pages 427–433, Atlanta, Georgia, November 2001.

    Google Scholar 

  17. T. Rajashekar and W. Croft. Combining automatic and manual index representations in probabilistic retrieval. Journal of American Society for Information Science, 46(4):272–283, 1995.

    Article  Google Scholar 

  18. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of American Society for Information Science, 41(4):288–297, 1990.

    Article  Google Scholar 

  19. G. Salton, E. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022–1036, 1963.

    Article  MathSciNet  Google Scholar 

  20. T. Saracevic and P. Kantor. A study of information seeking and retrieving. III. searchers, searches, and overlap. Journal of American Society for Information Science, 39(3):197–216, 1988.

    Article  Google Scholar 

  21. J. Swets. Information retrieval systems. Science, 141:245–250, 1963.

    Article  Google Scholar 

  22. K. Tumer and J. Ghosh. Linear and order statistics combiners for pattern classification. In A. Sharkey, editor, Combining Artificial Neural Networks, pages 127–162. Springer-Verlag, 1999.

    Google Scholar 

  23. C. C. Vogt. How much more is better? Characterizing the effects of adding more IR systems to a combination. In Proceedings of Content-Based Multimedia Information Access (RIAO), pages 457–475, Paris, France, April 2000.

    Google Scholar 

  24. C. Vogt and G. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(2–3):151–173, 1999.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sever, H., Tolun, M.R. (2002). Comparison of Normalization Techniques for Metasearch. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2002. Lecture Notes in Computer Science, vol 2457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36077-8_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-36077-8_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00009-9

  • Online ISBN: 978-3-540-36077-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics