Abstract
TREC-style evaluation is generally considered to be the use of test collections, an evaluation methodology referred to as the Cranfield paradigm. This paper starts with a short description of the original Cranfield experiment, with the emphasis on the how and why of the Cranfield framework. This framework is then updated to cover the more recent ”batch” evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. The final section contains advice on using these existing test collections and building new ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailey, P., Craswell, N., Hawking, D.: Engineering a Multi-Purpose Test Collection for Web Retrieval Experiments. Information Processing and Management 39(6), 853–871 (2003)
Bernstein, Y., Zobel, J.: Redundant Documents and Search Effectiveness. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, pp. 736–743 (2005)
Bookstein, A.: Relevance. Journal of the American Society for Information Science, 269–273 (September 1979)
Buckley, C., Voorhees, E.: Retrieval System Evaluation. In: TREC: Experiment and Evaluation in Information Retrieval, ch. 3. The MIT Press (2005)
Buckley, C., Voorhees, E.M.: Evaluating Evaluation Measure Stability. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 33–40 (2000)
Burgin, R.: Variations in Relevance Judgments and the Evaluation of Retrieval Performance. Information Processing and Management 28(5), 619–627 (1992)
Cleverdon, C.: Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing Systems. Aslib Cranfield Research Project, Cranfield, England (1962)
Cleverdon, C., Keen, E.: Factors Determining the Performance of Indexing Systems, vol. 2: Test Results. Aslib Cranfield Research Project, Cranfield, England (1966)
Cleverdon, C., Mills, J., Keen, E.: Factors Determining the Performance of Indexing Systems, vol. 1: Design. Aslib Cranfield Research Project, Cranfield, England (1966)
Cooper, W.: A Definition of Relevance for Information Retrieval. Information Storage and Retrieval 7, 19–37 (1971)
Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-based Refinement(MultiText Experiments for TREC-6). In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 303–320 (1998)
Fujii, A., Iwayama, M., Kando, N.: Introduction to the special issue on patent processing. Information Processing and Management 43, 1149–1153 (2007)
Harman, D.: Overview of the Third Text REtrieval Conference (TREC-3). In: Overview of the Third Text REtrieval Conference (TREC-3), Proceedings of TREC-3. (1995) 1–20
Harman, D.: Overview of the Fourth Text REtrieval Conference (TREC-4). In: Proceedings of the Fourth Text REtrieval Conference (TREC-4). (1996) 1–23
Harman, D.: Information Retrieval Evaluation. Morgan/Claypool (2011)
Harman, D., Buckley, C.: Overview of the Reliable Information Access Workshop. Information Retrieval 12, 615–641 (2009)
Harter, S.P.: Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness. Journal of the American Society for Information Science 47(1), 37–49 (1996)
Hawking, D., Voorhees, E., Craswell, N.: Overview of TREC-8 Web Track. In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 131–151 (2000)
Järvelin, K., Kekäläinen, J.: IR Evaluation Methods for Retrieving Highly Relevant Documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–48 (2000)
Merchant, R. (ed.): The Proceedings of the TIPSTER Text Program—Phase I. Morgan Kaufmann Publishing Co., San Mateo (1994)
Salton, G. (ed.): The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)
Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval 4, 247–375 (2010)
Sparck Jones, K., Bates, R.: Report on a Design Study for the “Ideal” Information Retrieval Test Collection. British Library Research and Development Report 5488, Computer Laboratory. University of Cambridge (1977)
Swanson, D.: Some Unexplained Aspects of the Cranfield Tests of Indexing Language Performance. Library Quarterly 41, 223–228 (1971)
Tague-Sutcliffe, J., Blustein, J.: A Statistical Analysis of the TREC-3 Data. In: Overview of the Third Text REtrieval Conference (TREC-3), Proceedings of TREC-3, pp. 385–398 (1995)
Voorhees, E., Harman, D. (eds.): TREC: Experiment and Evaluation in Information Retrieval. The MIT Press (2005)
Voorhees, E.M.: Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information Processing and Management 36(5), 697–716 (2000)
Voorhees, E.M.: Topic Set Size Redux. In: SIGIR 2009, pp. 806–807 (2009)
Voorhees, E.M., Buckley, C.: The Effect of Topic Set Size on Retrieval Experiment Error. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 316–323 (2002)
Voorhees, E.M., Harman, D.: Overview of the Fifth Text REtrieval Conference (TREC-5). In: Proceedings of the Fifth Text REtrieval Conference (TREC-5), pp. 1–28 (1997)
Womser-Hacker, C.: Multilingual Topic Generation within the CLEF 2001 Experiments. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 389–393. Springer, Heidelberg (2002)
Zhang, Y., Callan, J., Minka, T.: Novelty and Redundancy Detection in Adaptive Filtering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 81–88 (2002)
Zobel, J.: How Reliable are the Results of Large-Scale Information Retrieval Experiments. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–314 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Harman, D. (2013). TREC-Style Evaluations. In: Agosti, M., Ferro, N., Forner, P., Müller, H., Santucci, G. (eds) Information Retrieval Meets Information Visualization. PROMISE 2012. Lecture Notes in Computer Science, vol 7757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36415-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-36415-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36414-3
Online ISBN: 978-3-642-36415-0
eBook Packages: Computer ScienceComputer Science (R0)