SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Okapi at TREC
chapter
S. Robertson
S. Walker
M. Hancock-Beaulieu
A. Gull
M. Lau
National Institute of Standards and Technology
Donna K. Harman
Okapi at TREC
Stephen E. Robertson, Stephen Walker,
Micheline Hancock-Beaulieu, Aarron Gull, Marianna Lau
Centre for Interactive Systems Research
Department of Information Science
City University
Northampton Square
London BC1V OHB, UK
Advisers: Karen Sparck Jones (University of Cambridge); Peter Willett (University of Sheffield);
E. Michael Keen (University of Wales).
Abstract: The Okapi retrieval system is
described, technically and in terms of its design
principles. These include simplicity, robustness
and ease of use. The version of Okapi used for
TREC is further discussed. Designing
experiments within the TREC constraints but
using Okapi's supposed strengths proved
problematic, and some compromise was
necessary. The official TREC runs were (a) very
simple automatic processing of the ad-hoc topics;
(b) manually constructed ad-hoc queries; (c)
feedback on the manual queries from searchers'
relevance judgements; and (d) routing queries
automatically obtained using the training set in a
form of relevance feedback. The best run
(manual with feedback), although not up to the
best reported TREC results, was respectable, and
an encouragement to further development within
the same principles.
1. Introduction
Okapi is an experimental text retrieval system,
designed to use simple, robust techniques
internally and to present a user interface which
requires no training and no knowledge of
searching methods or techniques. It is presently
accessible by academic users at City University,
with the library catalogue and a scientific
abstracts journal as databases. It is used for
experimentation with and evaluation of novel
retrieval techniques.
21
A design principle of Okapi is that simple
techniques, without Boolean logic but with best-
match searching, and with little in the way of a
manually constructed knowledge base, can give
effective and efficient retrieval. `Simple' also
implies minimum effort, either manual or
machine, either at the set-up stage or at input or at
search time. In particular, relevance feedback
(which requires little or no additional user effort,
since users must make such judgements anyway),
provides a mechanism whereby an initial query
formulated with no great effort can be improved.
Such a search process might be regarded as
having something of the character of browsing: an
exploration of a topic rather than a precise
specification.
In some respects (e.g. highly elaborate topic
specifications; no evaluation of interactive
systems) TREC does not at all represent the kind
of retrieval activities for which Okapi was
designed. However, our approach to TREC has
been to try to arrive at some compromise between
the aims of Okapi and those of TREC. The
resulting performance was not spectacular, but
was (we believe) respectable enough to encourage
us to pursue the ideas further.
2. Background: the Okapi project
The following is a description of Okapi as it
existed before the start of TREC-related work
("interactive Okapi9'). Section 3 discusses some