SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection chapter N. Fuhr U. Pfeifer C. Bremkamp M. Pollmann National Institute of Standards and Technology D. K. Harman 4.3 Official runs procedures, this combination seems to be a prospective Two different runs were submitted for the routing queries, both based on the RPI model. Run dortPi uses the same document indexing function as for the adhoc queries. Query terms were weighted according to the RPI formula. In addition, each query was expanded by 20 single words. Phrases were not downweighted. Run dortVl is based on ltc document indexing. Here no query expansion took place. area of research. A Operational details of runs A.1 Basic Algorithms The algorithm A to find the coefficient vector a for the ad-hoc query term weights can be given as follows: run document indexing query expansion I dortVl dortPl ltc lsp none 20 terms average precision: Prec. Avg. [OCRerr] 0.3516 0.3800 query-wise comparison with median: Prec. Avg. 38:10 46:4 Prec. © 100 docs 31:11 40:5 Prec. © 1000 docs 32:9 37:7 Best/worst results: Prec. Avg. 1/0 4(2)10 Prec. © 100 docs 3(3)11(1) 7(5)11(1) Prec. © 1000 docs 6(2)/0(1) 10(2)/0(1) dortVl vs. dortPl: Prec. Avg. 10:39 Prec. © 100 docs 9:27 Prec. © 1000 docs 7:33 Table 10: Results for routing queries Table 10 shows the results for the two runs. The recall- precision curves are given in figure 2. Again, the results confirm our expectations that LSP indexing and query expansion yields better results. 5 Conclusions and outlook The experiments described in this paper have shown that probabilistic learning approaches can be applied successfully to different types of indexing and retrieval. For the ad-hoc queries, there seems to be still room for further improvement in the low recall range. In order to increase precision, a passage-wise comparison of query and document text should be performed. For this pur- pose, polynomial retrieval functions could be applied. In the case of the routing queries, we first have to inves- tigate methods for parameter estimation in combination with query expansion. However, with the large number of feedback documents given for this task, other types of retrieval models may be more suitable, e.g. query- specific polynomial retrieval functions. Finally, it should be emphasized that we still use rather simple forms of text analysis. Since our methods are flexible enough to work with more sophisticated analysis 73 Algorithm A 1 For each query document pair (qk, dm) [OCRerr] (Qi U Q2) x D8 with D8 being a sample from (D1 UD2) do 1.1 determine the relevance value [OCRerr]km of the document dm with respect to the query qk. 1.2 For each term t[OCRerr] occuring in q[OCRerr] do 1.2.1 determine the feature vector [OCRerr]j and the indexing weight Uim of the term t[OCRerr] w.r.t. to document dm. 1.3 For each feature j of the feature vectors x compute the value of YJ looping over the terms of the query. 1.4 Add vector x and relevance value r[OCRerr]m to the least squares matrix. the least squares matrix to find the co- 2 Solve efficient vector a The algorithm B to find the coefficient vector b for the document indexing is sketched here: Algorithm B 1 Index D1 U D2 (the learning document set) and Qi U Q2 (the learning query set). 2 For each document d [OCRerr] D1 U D2 2.1 For each q [OCRerr] Qi U Q2 2.1.1 Determine the relevance value r of d to q 2.1.2 For each term t in common be- tween qT (set of query terms) and [OCRerr]T (set of document terms) 2.1.2.1 Find values of the ele- ments of the relevance description involved in this run and add values plus relevance informa- tion to the least squares matrix being constructed 3 Solve the least squares matrix to find the coef- ficient vector b