SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Okapi at TREC-2
chapter
S. Robertson
S. Walker
S. Jones
M. Hancock-Beaulieu
M. Gatford
National Institute of Standards and Technology
D. K. Harman
Taking advantage of the very full topic statements to
derive query term frequency weights gives another sub-
stantial improvement in the automatic ad-hoc results.
Comparing the top row of Table 2 with the top row
of Table 1, there is a 20% increase in average precision.
The "noise" effect of the narrative and description fields
is far more than outweighed by the information they
give about the relative importance of terms (compare
the "TCND" row of Table 1 with the top row of Table
2).
It remains to be discovered how well these new mod-
els perform in searching other types of database. Term
frequency and document length components may not be
very useful in searching brief records with controlled in-
dexing, but one would expect these models to do well
on abstracts. It is also rare to have query statements
which are as full as the TIPSTER ones, so there are
many situations in which a q'f component would have
little or no effect.
7.2 Routing
Our results here (Table 4) were relatively good, and fur-
ther improved when re-run with BM11. However, the
TREC routing scenario is perhaps not particularly re-
alistic, given the large amount of relevance information,
which we made full use of as the sole source of query
terms. In addition, the best of our runs depended on
a long series of retrospective trials in which the num-
ber of query terms was varied. In a real-world situation
one would have to cope with the early stages when there
would be few documents and little relevance information
(initially none at all). It would be necessary to develop
a term selection and weighting procedure which was ca-
pable of progressing smoothly from a minimum of prior
information up to a TREC-type situation. It may be
possible to come up with a decision procedure for term
selection using something similar to the selection value
w(i) x [OCRerr] Perhaps a future TREC could include some
more restrictive routing emulations.
7.3 Interactive ad-hoc searching
The result of this trial was disappointing except ou pre-
cision at 100 documents (Table 5), scarcely better than
the official automatic ad-hoc run. On three topics it
gave the best result of any of our runs, and two more
were good, but the remaining 45 ranged from poor to
abysmal. Little analysis has yet been done. For some
topics it is clear that the search never got off the ground
because the searcher was unable to find enough relevant
documents to provide reliable feedback information, but
the mean number found per topic was ten, which should
have been enough to give reasonable results (cf Table
6, where ten feedback documents performs quite well).
Currently, there are discussions towards a more realistic
30
set of rules for interactive searching for TREC-3, and
we hope to develop a better procedure and interface.
7.4 Prospects
Paragraphs
When searching full text collections one often does not
want to search, or even necessarily to retrieve, complete
documents. Our new probabilistic models do not apply
to documents where the verbosity hypothesis does not
apply (Section 2.3). Some of the TREC-2 participants
searched "paragraphs" rather than documents, and this
is clearly right, provided a sensible division procedure
can be achieved. We made some progress towards de-
veloping a "paragraph" database model for the Okapi
system, but there has not been time to implement it.
Further work then needs to be done on methods of deriv-
ing the retrieval value of a document from the retrieval
value of its constituent paragraphs.
Parameter estimation
Work is in progress on methods of using logistic regres-
sion or similar techniques to estimate the parameters
for the new models.
Derivation and use of phrases and term
proximity
A few results are reported in Table 3. They are not
particularly encouraging. There is probably scope for
further experiments in this area, not only on tuples of
adjacent words but also on Keen-type [9] weighting of
query term clusters in retrieved documents.
References
[1] D.K. Harman (Ed.), The Firs' TexL RE[OCRerr]rieval
Conference (TREC-1). Gaithersburg, MD: NIST,
1993.
[2] Robertson S.E. ei aL Okapi at TREC. In: [1]
(pp.21-30).
[3] Walker, S. and Hancock-Beaulieu, M. Okapi a[OCRerr]
Cii[OCRerr]: an evalua[OCRerr]ion facili[OCRerr]y for in'erac[OCRerr]ive IR. Lon-
don: British Library, 1991. (British Library Re-
search Report 6056.)
[4] Hancock-Beaulieu, M.M. and Walker, S. An eval-
uation of automatic query expansion in an online
library catalogue. Journal of Documeniajion, [OCRerr]8,
Dec.1992, 406-421.
[5] Robertson, S.E. and Sparck Jones, K. Relevance
weighting of search terms. Journal of [OCRerr]he American
Socieiy for Inform aiion Science, 27,1976, 129-146.