SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models chapter N. Fuhr C. Buckley National Institute of Standards and Technology Donna K. Harman Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models Norbert Fllhr* Chris Buckleyf Abstract We describe the application of probabilistic indexing and retrieval methods to the TREC material. For document indexing, we apply a description-oriented approach which uses relevance feedback information from previous queries run on the same collection. This method is also very flexible w.r.t. the underlying document representation. In our experiments, we consider single words and phrases and use polynomial functions for mapping the statistical parameters of these terms onto probabilistic indexing weights. Based on these weights, a linear (utility-theoretic) retrieval function is applied when no relevance feedback data is available for the specific query. Otherwise, the retrieval[OCRerr]with[OCRerr]probabilistic[OCRerr]indexing model can be used. The experimental results show excellent performance in both cases, but also indicate possible improvements. 1 Learning in IR terms documents - - 1*....... - -`I........ learning application terms - 1 documents k - learning application queries queries routing queries ad-hoc queries search term weighting description-oriented from relevance feedback indexing Abbildung 1: Learning approaches in IR Figure 1 shows two major learning approaches that are used in IR, both of which are applicable to the tasks to be performed within TREc. For the routing queries, we have relevance feedback data for some documents w.r.t. a specific query, and then the system has to rank further documents for the same query. As indicated by the third dimension, our knowledge is restricted to the terms we have *University of Dortmund, Informatik VI, P.O. Box 500500, W-4600 Dortmund 50, Germany, fuhr[OCRerr]ls6.informatik.uni dortmund.de tDepartment of Computer Science, Upson Hail, Cornell University, Ithaca, NY 14853, USA, chnsb[OCRerr]cs.coniell.edu 89