SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Okapi at TREC
chapter
S. Robertson
S. Walker
M. Hancock-Beaulieu
A. Gull
M. Lau
National Institute of Standards and Technology
Donna K. Harman
Clearly one would in general expect end-users to
have more domain or subject knowledge,
especially for the kinds of queries provided for
TREC. Highly interactive systems in general, and
Okapi in particular, may be assumed to exploit
such subject knowledge; clearly relevance
feedback in ad-hoc searching can only work well if
it is relatively easy for the user to find some
relevant items from the initial search. In this
sense, we see the present experiment as to some
degree unfavourable to Okapi.
5.3.2 Searching
Searchers were expected to make whatever
interpretations of the topic they deemed
appropriate for the purpose of searching. In other
words, they could use words or phrases taken
from any part of the topic, or from their own
general or specific knowledge. They could also
have used other reference sources. However, they
were encouraged to use the system to help them
refine the search, in the way that an end-user
might explore the possibilities within the system
and try out different combinations of search
terms.
The combination of these ideas with the TREC
rules was a little clumsy and artificial. The
procedure was as follows:
(a) The searcher was given the topic in full, as
received by us.
(b)The searcher examined the topic and chose
some terms as candidates for searching
(possibly including terms not in the topic as
received).
(c) The searcher made exploratory searches,
examining the results, making tentative
relevance judgements and perhaps using the
semi-automatic query expansion facility (see
section 3.3) to suggest new terms.
(d)Having decided on an initial formulation, the
searcher then finished the exploratory session
and started the definitive session.
(e)The definitive session involved two stages, an
initial search and a first iteration feedback
search. The initial search was strictly in
accordance with the selected initial
formulation; the searcher examined the top few
documents, making relevance judgements.
(f) The frrst iteration feedback was purely
automatic from the relevance judgements,
including re-weighting and automatic
26
expansion. No further iterations were
conducted.
The guidelines to the searchers included the
following:
Time: Searchers were asked to allow very
roughly 30 minutes per topic. In fact, the
average was nearer 50 minutes.
Feedback: The guidance was to assess about the
first 20 documents retrieved by the initial
search, or to stop after finding about 8 relevant
(if that was sooner).
Relevance: If it seemed to be difficult to find any
relevant items, searchers were encouraged to
make generous relevance judgements, so as to
ensure that there was some basis for feedback
(see also section 6.2 below).
5.3.3 Remarks on the system
The bias in favour of initial formulation terms in
the relevance feedback formula was 2 out of 3
(i.e. 3 supposed relevant documents out of which
2 were supposed to contain the term).
Searchers were able to use the Boolean facility
described in section 3.2, for example to treat an
expression such as (A and B) as if it were a single
term, to be weighted like any other. However, the
emphasis was on the usual (in the Okapi context)
weighted searching of single terms, and this
facility was used only occasionally, and only as
part of larger best-match searches. In other
words, this use did not compromise the
characteristic of weighted searching as truly "best
match", with all the flexibility that that implies.
5.3.4 Choice of terms
The terms chosen by the searchers may be briefly
characterized by the following statistics:
Average num[OCRerr]r of terms 12.9
Terms appearing in the topic 10.5 (81%)
Terms appearing in different fields:
Description 3A
Narrative 6.0
Concept 7.5
Others 2.9
(these add up to more than the total because a term may
occur in more than one field).
For comparison, the Concept field has around 19-
20 terms on average.