SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models
chapter
N. Fuhr
C. Buckley
National Institute of Standards and Technology
Donna K. Harman
Optimizing Document Indexing and Search Term Weighting
Based on Probabilistic Models
Norbert Fllhr* Chris Buckleyf
Abstract
We describe the application of probabilistic indexing and retrieval methods to the TREC
material. For document indexing, we apply a description-oriented approach which uses relevance
feedback information from previous queries run on the same collection. This method is also very
flexible w.r.t. the underlying document representation. In our experiments, we consider single
words and phrases and use polynomial functions for mapping the statistical parameters of these
terms onto probabilistic indexing weights. Based on these weights, a linear (utility-theoretic)
retrieval function is applied when no relevance feedback data is available for the specific query.
Otherwise, the retrieval[OCRerr]with[OCRerr]probabilistic[OCRerr]indexing model can be used. The experimental results
show excellent performance in both cases, but also indicate possible improvements.
1 Learning in IR
terms
documents
- - 1*.......
- -`I........
learning
application
terms
- 1 documents
k -
learning
application
queries queries
routing queries ad-hoc queries
search term weighting description-oriented
from relevance feedback indexing
Abbildung 1: Learning approaches in IR
Figure 1 shows two major learning approaches that are used in IR, both of which are applicable to
the tasks to be performed within TREc. For the routing queries, we have relevance feedback data for
some documents w.r.t. a specific query, and then the system has to rank further documents for the
same query. As indicated by the third dimension, our knowledge is restricted to the terms we have
*University of Dortmund, Informatik VI, P.O. Box 500500, W-4600 Dortmund 50, Germany, fuhr[OCRerr]ls6.informatik.uni
dortmund.de
tDepartment of Computer Science, Upson Hail, Cornell University, Ithaca, NY 14853, USA, chnsb[OCRerr]cs.coniell.edu
89