Information Retrieval: Searching in the 21st Century Footnote 1 covers a wide spectrum of problems that concern members of the information retrieval (IR) community. Chapters in the book are written by authors who bring previously demonstrated expertise to the topics they discuss. In addition to the text, each chapter contains exercises geared toward using the book in a classroom setting. In editing the book, Göker and Davies aimed to deliver “comprehensive coverage of the main themes in modern information retrieval,” at a level intended for advanced undergraduates, graduate students, or business professionals (p. xxii). As I will discuss below, this goal situates the book in a competitive area—IR currently enjoys several excellent textbooks. Information retrieval adds to this domain, especially in its treatment of specialized topics.

Each chapter in Information Retrieval treats a single topic. Some of these topics are broad: e.g. formal IR models, context in IR, natural language processing. Other chapters are of narrower scope, including Web IR, mobile retrieval, and semantic search. The variety of topics covered in Information Retrieval is valuable. It is rare for an introductory text to include an entire chapter on context in information retrieval or on the information needs and searching behavior of image users. This book treats these topics in satisfying depth. Additionally, each chapter in the text contains an accompanying bibliography, providing students with tangible avenues for further study.

The chapters that comprise Information Retrieval span the spectrum from theoretical motivations for IR to practical applications that underpin modern search technologies.

Djoerd Hiemstra’s chapter on retrieval models is clear, well historicized, and of obvious usefulness for students of IR. It introduces the most studied IR models (e.g. Boolean, vector space, binary independence, two-Poisson, and language modeling), giving special attention to the way that each model relates historically to earlier retrieval approaches.

Students reading Information Retrieval also get theoretical information in Stuart Watt’s chapter on text categorization. Interestingly, Watt grounds his discussion of familiar issues in text classification in the less studied problem of genre identification. That is, Watt motivates his discussion of text categorization not with the familiar problem of topical classification, but rather with the problem of discriminating between genres of text. Strongly theoretical chapters also include Ayşe Göker’s, Hans Myrhaug’s and Ralf Bierig’s piece on context in IR and Andrew MacFarlane’s treatment of parallel computing in retrieval problems. Finding a theoretical approach to parallel computing in IR is unusual in an introductory text. But this inclusion is emblematic of Information Retrieval’s departure from material covered in already-published texts.

Stefan Rüger’s chapter on multimedia discovery also focuses on high level issues. Rüger serves the book well in two senses. First, having decided to avoid a highly technical exposition, Rüger grounds his discussion in several compelling examples of real-world systems. Additionally, Rüger stresses the strong role that exploratory actions play in effective IR—e.g. browsing, query reformulation—not only in multimedia settings, but in any retrieval problem.

Indeed, Information Retrieval grounds a great deal of its exposition by describing the operation of working IR systems. David Mountain, Hans Myrhaug and Ayşe Göker detail the challenges of mobile search by offering two case studies, each of which is presented via description and numerous screen captures. John Davies’, Alistair Dukes’, and Atanas Kiryakov’s chapter on semantic search contains a lengthy section on “semantic search tools,” affording the authors (and the reader) ample ground for understanding how annotated data and other Semantic Web resources have informed and can inform IR.

Several chapters treat more canonical introductory IR material. Nick Craswell’s and David Hawking’s chapter on Web IR introduces crucial problems such as crawling, adversarial IR, advertisement placement, and the generation of document surrogates during result presentation. Daqing He and Jianqiang Wang cover the challenges of cross-language IR. Their exposition introduces basic resources for cross-lingual retrieval such as bilingual dictionaries and parallel corpora. Additionally, He and Wang offer a discussion of the practicalities of using these resources—structured queries and more heuristic approaches.

As I noted earlier, several very good introductory IR textbooks are currently available to instructors of IR courses, as well as to people interested in learning about the field independently. In addition to the well-known texts by Baeza-Yates and Ribeiro-Neto (1999) and Witten et al. (1999), recent years have seen publication of new, excellent introductions to IR (Croft et al. 2009; Manning et al. 2008). Students of IR (at the beginning and intermediate levels) now have many choices with respect to textbooks. An obvious question for would-be readers of Information Retrieval is, where does Gökers’ and Davies’ book fit into this crowded field?

When reading Information Retrieval, I had the sense that each chapter’s authors were given generous latitude to cover specific areas that interested them the most. This is both a virtue and a small liability. As noted earlier, readers of Information Retrieval will find treatment of each topic in the book that departs in focus from other textbooks’ expositions. I learned new material from each chapter of Information Retrieval.

However, Information Retrieval omits topics that are essential for those starting their exploration of IR. For instance, Information Retrieval contains an excellent chapter on user-centered system evaluation. But there is no comparable chapter on Cranfield-style evaluation. Of course, individual chapters review test collection-based evaluation in the course of their exposition. But these treatments are necessarily brief, often referring to the methods of evaluation chosen by particular TREC tracks. Given the pivotal role that TREC (and the Cranfield methodology in general) has played in advancing IR, this omission is problematic.

Several other core issues are either missing discussion or receive only brief attention in Information Retrieval. For example, learning to rank and the map reduce computational model are absent from the text. These topics have arrived relatively recently in the IR literature but are a necessity for fluency in the field.

Of course, IR is a broad science and no book is likely to embrace the full scope of the subject. Despite the omissions that I have noted, the subjects that are included in Information Retrieval offer an opportunity for readers to enjoy some of this breadth. I would recommend Information Retrieval to readers who already have a base of knowledge on core IR concepts. Alternatively, an introductory retrieval course could benefit from inclusion of selected chapters from the text, thus expanding their focus beyond IR’s most canonical topics. In Information Retrieval, accomplished researchers treat topics that anyone interested in IR would benefit from hearing, whether they are hearing them for the first time or after years in the field.