SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection
chapter
N. Fuhr
U. Pfeifer
C. Bremkamp
M. Pollmann
National Institute of Standards and Technology
D. K. Harman
of query-document pairs with relevance judgements in
order to determine the coefficients a[OCRerr]. The values Yj can
much shorter than in the TREC collection. These facts
may account for the different results.
be computed from the number of query terms, the values
of the query term features and the document indexing test sample
weights. run learning sample features Q1/D12 Q2/D3
For the experiments, the following parameters were con 1 every doc. 0.2698 0.2678-
sidered as query term features: 2 every 100. doc. i 0.2700 0.2678
XO =1 (constant) 3 every 1000. doc. 0.2662
4 judged docs only i 0.2635 0.2677
Xi = if (within-query-frequency)
5 every doc. x 0.2654
x2 =logif
6 every 100. doc. x 0.2677
x3 = if idf
x4 = is[OCRerr]phrase Table 2: Variations of the reg learning sample and the
x5 = inijile (=1, if term occurs in query title, query features
=0 otherwise)
For most of our experiments, we only used the parame-
run
ter vector x = ....... , x4)T. The full vector is denoted
as x[OCRerr]. Below, we call this query term weighting method factor 1 2 3 4 S 6
reg. constant -3.05 -2.96 1.04 -20.80 2.47 -1.71
This method is compared with the standard SMART if -7.54 -10.67 -2.40 14.58 -4.32 14.11
weighting schemes: log if -9.01 -5.69 -5.08 -81.58 -1.48 -0.70
if.idf 5.84 7.00 1.63 8.72 1.81 7.70
nnn: Cik = if
is[OCRerr]phrase -8.06 -19.23 -2.73 19.81 -2.97 20.39
ntc: Cik = if. idf
injille -1.70 3.44
Inc: Cik = (1+logtf)
ltc: Cik = (1 + logif) idf Table 3: Coefficients of query regression
3.2 Experiments
In order to have three different samples for learning
and/or testing purposes, we used the following com-
binations of query sets and document sets as samples:
Q3/D12 was used as training sample for the reg method,
and both Q1/D12 and Q2/D3 were used for testing. As
evaluation measure, we consider the 11-point average
of precision (i.e., the average of the precision values at
0.0,0.1,..., 1.0 recall).
sam - le p
QTW Q1/D12 Q2/D3
nnn 0.2303
ntc 0.2754
inc 0.2291 0.2601
lic 0.2826 0.2783
reg 0.2698 0.2678
Table 1: Global results for single words
First, we considered single words only. Table 1 shows
the results of the different query term weighting (QTW)
methods. First, it should be noted that the ntc and ltc
methods perform better than nnn and Inc. This find-
ing is somewhat different from the results presented in
[Fuhr & Buckley 91], where the nnn weighting scheme
gave us better results than the ntc method for lsp in-
dexing. However, in the earlier experiments, we used
only fairly small databases, and the queries also were
69
In a second series of experiments, we varied the sam-
ple size and the set of features of the regression method
(table 2). Besides using every document from the learn-
ing sample, we only considered every 100th and every
1000th document from the database, as well as only
those documents for which there were explicit relevance
judgements available. As the results show almost no
differences, it seems to be sufficient to use only a small
portion of the database as training sample in order to
save computation time. The additional consideration
of the occurrence of a term in the query title also did
not effect the results. So query titles seem to be not
very significant. It is an open question whether or not
other parts of the queries are more significant, so that
the consideration as an additional feature would affect
retrieval quality.
The coefficients computed by the regression process for
the second series of experiments are shown in table 3.
It is obvious that the coeffients depend heavily on the
choice of the training sample so it is quite surprising
that retrieval quality is not affected by this factor. The
only coefficient which does not change its sign through
all the runs is the one for the if. idf factor. This seems
to confirm the power of this factor. The other factors
can be regarded as being only minor modifications of
the if idf query term weight.
Overall, it must be noted that the regression method
does not yield an improvement over the ntc and ltc
methods. This seems to be surprising, since the regres-
sion is based on the same factors which also go into the