Comparison of Normalization Techniques for Metasearch

Sever, Hayri; Tolun, Mehmet R.

doi:10.1007/3-540-36077-8_13

Hayri Sever⁵ &
Mehmet R. Tolun⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2457))

Included in the following conference series:

International Conference on Advances in Information Systems

789 Accesses
2 Citations

Abstract

It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization technique is to target the first requirement, i.e., document scores of different retrieval outputs are brought into a common scale so that document scores can be comparable across combined retrieval outputs. This has been a recent subject of researches in metasearch and information filtering fields. In this paper, we present a different perspective on multiple evidence combination and investigate various normalization techniques, mostly ad-hoc in nature, with a special focus on the SUM, which shifts minimum scores to zero and then scales their summation to one. This formal approach is equivalent to normalize the distribution of scores of all documents in a retrieval output by dividing them by their sample mean. We have made extensive experiments using ad hoc tracks of third and fifth TREC collections and CLEF’00 database. We argue that (1) the normalization method SUM is consistently better than the other traditionally proposed ones when combining outputs of search systems operating on a single database; (2) the SUM for combination of outputs of search systems operating on mutually exclusive databases is still valuable alternative to the one weighting score distributions of documents by their databases’ size.

This material is based on work supported in general by the Center for Intelligent Information Retrieval. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor(s).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Arampatzis and A. van Hameren. Maximum likelihood estimation for filtering thresholds. In Proceedings of the 24th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 285–293, New Orleans, LA, September 2001.
Google Scholar
N. Belkin, C. Cool, W. Croft, and J. Callan. The effect of multiple query representation on information retrieval system performance. In Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 339–346, Pittsburgh, PA, USA, 1993.
Google Scholar
N. Belkin, P. Kantor, E. Fox, and J. Shaw. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3):431–448, 1995.
Article Google Scholar
J. Callan, W. Croft, and S. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert System Applications (DEXA 3), pages 78–83, Berlin, 1992. Springer-Verlag.
Google Scholar
W. Croft and H. Turtle. A retrieval model for incorporating hypertext links. In Proceedings of ACM Hypertext Conference, pages 213–224, New Orleans, LA, November 1989.
Google Scholar
W. Croft. Combining approaches to information retrieval. In W. Croft, editor, Advances in Information Retrieval, pages 1–36. Kluwer Academic Publishers, 2000.
Google Scholar
C. Dwork, S. R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In World Wide Web, pages 613–622, 2001.
Google Scholar
J. Fiscus and G. R. Doddington. Topic detection and tracking overview. In J. Allan, editor, Topic Detection and Tracking: Event-based Information Organization, pages 17–31. Kluwer Academic Publishers, 2002.
Google Scholar
S. Gauch, G. Wang, and M. Gomez. Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 2(9):637–649, Sept. 1996. http://www.jucs.org/jucs29/profusionintelligentfusionfrom.
Google Scholar
D. Hiemstra, W. Kraaij, R. Pohlmann, and T. Westerveld. Translation resources, merging strategies and relevance feedback for cross-language information retrieval. In C. Peters, editor, Cross-language information retrieval and evaluation, Lecture Notes in Computer Science (LNCS-2069), pages 102–115. Springer Verlag, NY, 2001.
Chapter Google Scholar
J. Katzer, M. J. McGill, J. Tessier, W. Frakes, and P. DasGupta. A study of the overlap among document representations. Information Technology: Research and Development, 1(4):261–274, Oct 1982.
Google Scholar
S. Lawrence and C. Giles. Searching the World Wide Web. Science, 280(5360):98–100, April 1998.
Article Google Scholar
J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In E. A. Fox, editor, Proceedings of the 18th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pages 180–188, Seattle, WA, July 1995.
Google Scholar
J. H. Lee. Analyses of multiple evidence combination. In E. A. Fox, editor, Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 267–276, Philadelphia, Pennsylvania, July 1997.
Google Scholar
R. Manmatha and H. Sever. A formal approach to score normalization for metasearch. In Proceedings of Human Language Technology Conference, San Diego, CA, March 2002.
Google Scholar
M. Montague and J. Aslam. Relevance score normalization for metasearch. In Proceedings of the ACM 10th Annual International Conference on Information and Knowledge Management (CIKM), pages 427–433, Atlanta, Georgia, November 2001.
Google Scholar
T. Rajashekar and W. Croft. Combining automatic and manual index representations in probabilistic retrieval. Journal of American Society for Information Science, 46(4):272–283, 1995.
Article Google Scholar
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of American Society for Information Science, 41(4):288–297, 1990.
Article Google Scholar
G. Salton, E. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022–1036, 1963.
Article MathSciNet Google Scholar
T. Saracevic and P. Kantor. A study of information seeking and retrieving. III. searchers, searches, and overlap. Journal of American Society for Information Science, 39(3):197–216, 1988.
Article Google Scholar
J. Swets. Information retrieval systems. Science, 141:245–250, 1963.
Article Google Scholar
K. Tumer and J. Ghosh. Linear and order statistics combiners for pattern classification. In A. Sharkey, editor, Combining Artificial Neural Networks, pages 127–162. Springer-Verlag, 1999.
Google Scholar
C. C. Vogt. How much more is better? Characterizing the effects of adding more IR systems to a combination. In Proceedings of Content-Based Multimedia Information Access (RIAO), pages 457–475, Paris, France, April 2000.
Google Scholar
C. Vogt and G. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(2–3):151–173, 1999.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Massachusetts, 01003, Amherst, MA, USA
Hayri Sever
Department of Computer Engineering, Eastern Mediterranean University, Gazimagusa, TRNC, via Mersin 10, Turkey
Mehmet R. Tolun

Authors

Hayri Sever
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet R. Tolun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Engineering Department, Dokuz Eylul University, 35100, Izmir, Bornova, Turkey
Tatyana Yakhno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sever, H., Tolun, M.R. (2002). Comparison of Normalization Techniques for Metasearch. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2002. Lecture Notes in Computer Science, vol 2457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36077-8_13

Download citation

DOI: https://doi.org/10.1007/3-540-36077-8_13
Published: 24 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00009-9
Online ISBN: 978-3-540-36077-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics