NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Okapi at TREC-2 chapter S. Robertson S. Walker S. Jones M. Hancock-Beaulieu M. Gatford National Institute of Standards and Technology D. K. Harman Taking advantage of the very full topic statements to derive query term frequency weights gives another sub- stantial improvement in the automatic ad-hoc results. Comparing the top row of Table 2 with the top row of Table 1, there is a 20% increase in average precision. The "noise" effect of the narrative and description fields is far more than outweighed by the information they give about the relative importance of terms (compare the "TCND" row of Table 1 with the top row of Table 2). It remains to be discovered how well these new mod- els perform in searching other types of database. Term frequency and document length components may not be very useful in searching brief records with controlled in- dexing, but one would expect these models to do well on abstracts. It is also rare to have query statements which are as full as the TIPSTER ones, so there are many situations in which a q'f component would have little or no effect. 7.2 Routing Our results here (Table 4) were relatively good, and fur- ther improved when re-run with BM11. However, the TREC routing scenario is perhaps not particularly re- alistic, given the large amount of relevance information, which we made full use of as the sole source of query terms. In addition, the best of our runs depended on a long series of retrospective trials in which the num- ber of query terms was varied. In a real-world situation one would have to cope with the early stages when there would be few documents and little relevance information (initially none at all). It would be necessary to develop a term selection and weighting procedure which was ca- pable of progressing smoothly from a minimum of prior information up to a TREC-type situation. It may be possible to come up with a decision procedure for term selection using something similar to the selection value w(i) x [OCRerr] Perhaps a future TREC could include some more restrictive routing emulations. 7.3 Interactive ad-hoc searching The result of this trial was disappointing except ou pre- cision at 100 documents (Table 5), scarcely better than the official automatic ad-hoc run. On three topics it gave the best result of any of our runs, and two more were good, but the remaining 45 ranged from poor to abysmal. Little analysis has yet been done. For some topics it is clear that the search never got off the ground because the searcher was unable to find enough relevant documents to provide reliable feedback information, but the mean number found per topic was ten, which should have been enough to give reasonable results (cf Table 6, where ten feedback documents performs quite well). Currently, there are discussions towards a more realistic 30 set of rules for interactive searching for TREC-3, and we hope to develop a better procedure and interface. 7.4 Prospects Paragraphs When searching full text collections one often does not want to search, or even necessarily to retrieve, complete documents. Our new probabilistic models do not apply to documents where the verbosity hypothesis does not apply (Section 2.3). Some of the TREC-2 participants searched "paragraphs" rather than documents, and this is clearly right, provided a sensible division procedure can be achieved. We made some progress towards de- veloping a "paragraph" database model for the Okapi system, but there has not been time to implement it. Further work then needs to be done on methods of deriv- ing the retrieval value of a document from the retrieval value of its constituent paragraphs. Parameter estimation Work is in progress on methods of using logistic regres- sion or similar techniques to estimate the parameters for the new models. Derivation and use of phrases and term proximity A few results are reported in Table 3. They are not particularly encouraging. There is probably scope for further experiments in this area, not only on tuples of adjacent words but also on Keen-type [9] weighting of query term clusters in retrieved documents. References [1] D.K. Harman (Ed.), The Firs' TexL RE[OCRerr]rieval Conference (TREC-1). Gaithersburg, MD: NIST, 1993. [2] Robertson S.E. ei aL Okapi at TREC. In: [1] (pp.21-30). [3] Walker, S. and Hancock-Beaulieu, M. Okapi a[OCRerr] Cii[OCRerr]: an evalua[OCRerr]ion facili[OCRerr]y for in'erac[OCRerr]ive IR. Lon- don: British Library, 1991. (British Library Re- search Report 6056.) [4] Hancock-Beaulieu, M.M. and Walker, S. An eval- uation of automatic query expansion in an online library catalogue. Journal of Documeniajion, [OCRerr]8, Dec.1992, 406-421. [5] Robertson, S.E. and Sparck Jones, K. Relevance weighting of search terms. Journal of [OCRerr]he American Socieiy for Inform aiion Science, 27,1976, 129-146.