SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection
chapter
N. Fuhr
U. Pfeifer
C. Bremkamp
M. Pollmann
National Institute of Standards and Technology
D. K. Harman
1
0.9- dortL2" [OCRerr]
dortQ2"
0.8-
Precision 0.5-
0.3-
0.2-
0.1-
0 I I I I I I I I I1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Figure 1: Recall-precision curves of ad-hoc runs
regression optimizes the overall performance, but not
necessarily the retrieval quality when only the top rank-
ing documents are considered. With regard to the mod-
erate results for the reg query term weighting method,
the good performance of dortQ2 obviously stems from
the quality of our document indexing method.
run dortL2 dortq2
query term weighting nnn reg
average precision:
Prec. Avg. 0.3151 0.3340
query-wise comparison with median:
Prec. Avg. 37:2 45:4
Prec. @ 100 docs 35:11 34:7
Prec. @ 1000 docs [OCRerr] 37:10 45:2
Best/worst results:
Prec. Avg. I 3/0 3(1)10
Prec. [OCRerr] 100 docs 3(2)/i 4(1)/0(2)
Prec. @ 1000 docs [OCRerr] 6(1)/0 9(1)/0
dortL2 vs. dortq2.
Prec. Avg. 21:29
Prec. [OCRerr] 100 docs 22:24
Prec. [OCRerr] 1000 docs 17:29
Table 7: Results for adhoc queries
4 Query term weighting
routing queries
4.1 Theoretical background
for
For the routing queries, the retrieval-with-probabilistic-
indexing (RPI) model described in [Fuhr 89a] was ap-
plied. The corresponding retrieval function is based on
the following parameters:
U[OCRerr]Tfl indexing weight of term t[OCRerr] in document dm
DRk set of documents judged relevant for query qk,
Pik expectation of the indexing weight of term t[OCRerr] in DR
DkN set of documents judged nonrelevant for query qk,
rjk expectation of the indexing weight of t[OCRerr] in DN
The parameters Pik and r[OCRerr]k can be estimated based on
relevance feedback data as follows:
Pik Uim
d[OCRerr][OCRerr]EDkR
__ 1
qik - Uim
d[OCRerr]EDkN
Then the query term weight is computed by the formula
Cik - Pik(l - r[OCRerr]k) -1
rjk(l - Pik)
71