SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Okapi at TREC chapter S. Robertson S. Walker M. Hancock-Beaulieu A. Gull M. Lau National Institute of Standards and Technology Donna K. Harman Okapi at TREC Stephen E. Robertson, Stephen Walker, Micheline Hancock-Beaulieu, Aarron Gull, Marianna Lau Centre for Interactive Systems Research Department of Information Science City University Northampton Square London BC1V OHB, UK Advisers: Karen Sparck Jones (University of Cambridge); Peter Willett (University of Sheffield); E. Michael Keen (University of Wales). Abstract: The Okapi retrieval system is described, technically and in terms of its design principles. These include simplicity, robustness and ease of use. The version of Okapi used for TREC is further discussed. Designing experiments within the TREC constraints but using Okapi's supposed strengths proved problematic, and some compromise was necessary. The official TREC runs were (a) very simple automatic processing of the ad-hoc topics; (b) manually constructed ad-hoc queries; (c) feedback on the manual queries from searchers' relevance judgements; and (d) routing queries automatically obtained using the training set in a form of relevance feedback. The best run (manual with feedback), although not up to the best reported TREC results, was respectable, and an encouragement to further development within the same principles. 1. Introduction Okapi is an experimental text retrieval system, designed to use simple, robust techniques internally and to present a user interface which requires no training and no knowledge of searching methods or techniques. It is presently accessible by academic users at City University, with the library catalogue and a scientific abstracts journal as databases. It is used for experimentation with and evaluation of novel retrieval techniques. 21 A design principle of Okapi is that simple techniques, without Boolean logic but with best- match searching, and with little in the way of a manually constructed knowledge base, can give effective and efficient retrieval. `Simple' also implies minimum effort, either manual or machine, either at the set-up stage or at input or at search time. In particular, relevance feedback (which requires little or no additional user effort, since users must make such judgements anyway), provides a mechanism whereby an initial query formulated with no great effort can be improved. Such a search process might be regarded as having something of the character of browsing: an exploration of a topic rather than a precise specification. In some respects (e.g. highly elaborate topic specifications; no evaluation of interactive systems) TREC does not at all represent the kind of retrieval activities for which Okapi was designed. However, our approach to TREC has been to try to arrive at some compromise between the aims of Okapi and those of TREC. The resulting performance was not spectacular, but was (we believe) respectable enough to encourage us to pursue the ideas further. 2. Background: the Okapi project The following is a description of Okapi as it existed before the start of TREC-related work ("interactive Okapi9'). Section 3 discusses some