SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection chapter N. Fuhr U. Pfeifer C. Bremkamp M. Pollmann National Institute of Standards and Technology D. K. Harman 1 0.9- dortL2" [OCRerr] dortQ2" 0.8- Precision 0.5- 0.3- 0.2- 0.1- 0 I I I I I I I I I1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Figure 1: Recall-precision curves of ad-hoc runs regression optimizes the overall performance, but not necessarily the retrieval quality when only the top rank- ing documents are considered. With regard to the mod- erate results for the reg query term weighting method, the good performance of dortQ2 obviously stems from the quality of our document indexing method. run dortL2 dortq2 query term weighting nnn reg average precision: Prec. Avg. 0.3151 0.3340 query-wise comparison with median: Prec. Avg. 37:2 45:4 Prec. @ 100 docs 35:11 34:7 Prec. @ 1000 docs [OCRerr] 37:10 45:2 Best/worst results: Prec. Avg. I 3/0 3(1)10 Prec. [OCRerr] 100 docs 3(2)/i 4(1)/0(2) Prec. @ 1000 docs [OCRerr] 6(1)/0 9(1)/0 dortL2 vs. dortq2. Prec. Avg. 21:29 Prec. [OCRerr] 100 docs 22:24 Prec. [OCRerr] 1000 docs 17:29 Table 7: Results for adhoc queries 4 Query term weighting routing queries 4.1 Theoretical background for For the routing queries, the retrieval-with-probabilistic- indexing (RPI) model described in [Fuhr 89a] was ap- plied. The corresponding retrieval function is based on the following parameters: U[OCRerr]Tfl indexing weight of term t[OCRerr] in document dm DRk set of documents judged relevant for query qk, Pik expectation of the indexing weight of term t[OCRerr] in DR DkN set of documents judged nonrelevant for query qk, rjk expectation of the indexing weight of t[OCRerr] in DN The parameters Pik and r[OCRerr]k can be estimated based on relevance feedback data as follows: Pik Uim d[OCRerr][OCRerr]EDkR __ 1 qik - Uim d[OCRerr]EDkN Then the query term weight is computed by the formula Cik - Pik(l - r[OCRerr]k) -1 rjk(l - Pik) 71