Abstract
This paper describes a new algorithm for merging the results of remote collections in a distributed information retrieval environment. The algorithm makes use only of the ranks of the returned documents, thus making it very efficient in environments where the remote collections provide the minimum of cooperation. Assuming that the correlation between the ranks and the relevancy scores can be expressed through a logistic function and using sampled documents from the remote collections the algorithm assigns local scores to the returned ranked documents. Subsequently, using a centralized sample collection and through linear regression, it assigns global scores, thus producing a final merged document list for the user. The algorithm’s effectiveness is measured against two state-of-the-art results merging algorithms and its performance is found to be superior to them in environments where the remote collections do not provide relevancy scores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Avrahami, T.T., et al.: The Fedlemur Project: Federated Search in the Real World. J. Am. Soc. Inf. Sci. Technol. 57(3), 347–358 (2006)
Bergman, M.: The deep web: surfacing the hidden value. BrightPlanet (2001), http://www.brightplanet.com/resources/details/deepweb.html
Callan, J.: Distributed Information Retrieval. In: Croft, W.B. (ed.) Advances in information retrieval, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)
Callan, J., Connell, M.: Query-based Sampling of Text Databases. ACM Trans. Inf. Syst. 19(2), 97–130 (2001)
Callan, J.P., Croft, W.B., Harding, S.M.: Inquery Retrieval System. In: 3rd International Conference on Database and Expert Systems Applications, pp. 78–83 (1992)
Callan, J., Zhihong, L.U., Croft, W.B.: Searching Distributed Collections With Inference Networks. In: SIGIR ’95, pp. 21–28 (1995)
Le Calve, A., Savoy, J.: Database Merging Strategy Based on Logistic Regression. Inf. Process. Manage. 36(3), 341–359 (2000)
Craswell, N., Hawking, D., Thistlewaite, P.B.: Merging Results from Isolated Search Engines. In: Australasian Database Conference, pp. 189–200 (1999)
Gravano, L., et al.: STARTS: Stanford proposal for internet meta-searching. In: 20th SIGMOD, pp. 207–218 (1997)
Lee, J.H.: Analyses of multiple evidence combination, pp. 267–276 (1997)
Lemur Toolkit, http://www.lemurproject.org
Powell, A.L., et al.: The Impact of Database Selection on Distributed Searching. In: SIGIR ’00, pp. 232–239 (2000)
Robertson, S.E., et al.: Okapi at Trec-3. In: TREC-3, pp. 109–126 (1994)
Sherman, C.: Search for the invisible web. Guardian Unlimited, London (2001)
Si, L., Callan, J.: A Semisupervised Learning Method to Merge Search Engine Results. ACM Trans. Inf. Syst. 21(4), 457–491 (2003)
Si, L., Callan, J.: Relevant Document Distribution Estimation Method for Resource Selection. In: SIGIR ’03, pp. 298–305 (2003)
Voorhees, E.M., Gupta, N.K., Johnson-Laird, B.: The Collection Fusion Problem. In: TREC-3, pp. 500–725 (1994)
Yager, R.R., Rybalov, A.: On the Fusion of Documents From Multiple Collection Information Retrieval Systems. J. Am. Soc. Inf. Sci. 49(13), 77–84 (1998)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR’01, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Paltoglou, G., Salampasis, M., Satratzemi, M. (2007). Results Merging Algorithm Using Multiple Regression Models. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)