SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection chapter N. Fuhr U. Pfeifer C. Bremkamp M. Pollmann National Institute of Standards and Technology D. K. Harman of query-document pairs with relevance judgements in order to determine the coefficients a[OCRerr]. The values Yj can much shorter than in the TREC collection. These facts may account for the different results. be computed from the number of query terms, the values of the query term features and the document indexing test sample weights. run learning sample features Q1/D12 Q2/D3 For the experiments, the following parameters were con 1 every doc. 0.2698 0.2678- sidered as query term features: 2 every 100. doc. i 0.2700 0.2678 XO =1 (constant) 3 every 1000. doc. 0.2662 4 judged docs only i 0.2635 0.2677 Xi = if (within-query-frequency) 5 every doc. x 0.2654 x2 =logif 6 every 100. doc. x 0.2677 x3 = if idf x4 = is[OCRerr]phrase Table 2: Variations of the reg learning sample and the x5 = inijile (=1, if term occurs in query title, query features =0 otherwise) For most of our experiments, we only used the parame- run ter vector x = ....... , x4)T. The full vector is denoted as x[OCRerr]. Below, we call this query term weighting method factor 1 2 3 4 S 6 reg. constant -3.05 -2.96 1.04 -20.80 2.47 -1.71 This method is compared with the standard SMART if -7.54 -10.67 -2.40 14.58 -4.32 14.11 weighting schemes: log if -9.01 -5.69 -5.08 -81.58 -1.48 -0.70 if.idf 5.84 7.00 1.63 8.72 1.81 7.70 nnn: Cik = if is[OCRerr]phrase -8.06 -19.23 -2.73 19.81 -2.97 20.39 ntc: Cik = if. idf injille -1.70 3.44 Inc: Cik = (1+logtf) ltc: Cik = (1 + logif) idf Table 3: Coefficients of query regression 3.2 Experiments In order to have three different samples for learning and/or testing purposes, we used the following com- binations of query sets and document sets as samples: Q3/D12 was used as training sample for the reg method, and both Q1/D12 and Q2/D3 were used for testing. As evaluation measure, we consider the 11-point average of precision (i.e., the average of the precision values at 0.0,0.1,..., 1.0 recall). sam - le p QTW Q1/D12 Q2/D3 nnn 0.2303 ntc 0.2754 inc 0.2291 0.2601 lic 0.2826 0.2783 reg 0.2698 0.2678 Table 1: Global results for single words First, we considered single words only. Table 1 shows the results of the different query term weighting (QTW) methods. First, it should be noted that the ntc and ltc methods perform better than nnn and Inc. This find- ing is somewhat different from the results presented in [Fuhr & Buckley 91], where the nnn weighting scheme gave us better results than the ntc method for lsp in- dexing. However, in the earlier experiments, we used only fairly small databases, and the queries also were 69 In a second series of experiments, we varied the sam- ple size and the set of features of the regression method (table 2). Besides using every document from the learn- ing sample, we only considered every 100th and every 1000th document from the database, as well as only those documents for which there were explicit relevance judgements available. As the results show almost no differences, it seems to be sufficient to use only a small portion of the database as training sample in order to save computation time. The additional consideration of the occurrence of a term in the query title also did not effect the results. So query titles seem to be not very significant. It is an open question whether or not other parts of the queries are more significant, so that the consideration as an additional feature would affect retrieval quality. The coefficients computed by the regression process for the second series of experiments are shown in table 3. It is obvious that the coeffients depend heavily on the choice of the training sample so it is quite surprising that retrieval quality is not affected by this factor. The only coefficient which does not change its sign through all the runs is the one for the if. idf factor. This seems to confirm the power of this factor. The other factors can be regarded as being only minor modifications of the if idf query term weight. Overall, it must be noted that the regression method does not yield an improvement over the ntc and ltc methods. This seems to be surprising, since the regres- sion is based on the same factors which also go into the