SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Automatic Routing and Ad-hoc Retrieval Using SMART: TREC 2 chapter C. Buckley J. Allan G. Salton National Institute of Standards and Technology D. K. Harman relevant documents, and the non-relevant doc- uments (A,B,C above); and then exactly what terms are to be considered part of the final vec- tor. In TREC 1, a similar approach originally proposed by Ide was used.[8, 13] That seemed to work well for the fragmentary relevance in- formation ava[OCRerr]ahle for TREC 1. TREC 2 has considerably more information available- and of much higher quality-so Rocchio's ap- proach is more appropriate. The best parameter values to use for Roc- chio's algorithm were investigated by splitting the TREC 1 collection into two parts along the natural Dl (TREC 1 initial collection) and D2 (TREC 1 routing collection) lines. Dl formed the learning set and D2 the evaluation set for a large number of experimental runs determining these parameters. The original TREC 1 routing queries (Q2) are expanded and weighted using Rocchio's algorithm with the relevance information from Dl. They are then evaluated by running them against D2 and using the known Q2-D2 relevance infor- mation. Queries are expanded by adding the "best" X single terms and the "best" Y phrases to the original query. We used a simple notion of "best" for TREC 2: terms that occurred in the most relevant documents (ties were broken by considering the highest average weight in the relevant documents). There is a core set of 158 runs using differ- ent parameter values for both expansion and weighting. Table 4 gives the six parameter possibilities The trends noticeable in this in- vestigatory set of runs are: 1. Overall effectiveness increases strongly as the number of terms added increases, up until 200 terms at which point it starts to level off. 2. Phrases are reasonably important (6% difference) at low single term expansion numbers, but become less important at higher values (1% difference) 3. As expected, weights in relevant doc- uments are far more important than weights in non-relevant documents. The parameters of our official run, crnlRl are: adding X = 300 single terms, adding Y = 50 phrases, importance of original query of A = 8, 51 importance of weight in relevant documents of B = 16, importance of weight in non-relevant documents of C = 4, and relative importance of phrases at retrieval time of P = 0.5. Q[OCRerr]ery-by-Query Parameter Esti- mation We examined the results for the 158 test rout- ing runs in more detail, query by query. For each of the 50 queries, we found the best test run. The results (see Table 5) show some in- teresting patterns not brought out by the over- all averages. Not surprisingly, the parameters used for crnlRl are not best for any single query; they are just a reasonable compromise. There seem to be two main groups of queries: one in which very limited expansion is use- ful (even 6 queries where no expansion is pre- ferred); and one in which the more terms ([OCRerr]re added, the better (23 queries with expansion of 500 single terms). If massive expansion is useful, in general the original query is less im- portant than the expanded terms: A is much less than B. There is another separate dis- tinction between those queries where phrases are useful and those where phrases appear use- less: 1 query worked best adding 100 phrases, 6 with 50 added, 2 with 10,16 using the orig- inal phrases only, and 25 using no phrases at all. If we retrospectively choose the best param- eters for each query (something that cannot be done in practice), then we achieve roughly a 10% improvement. This is substantial enough to actually try a predictive run, so our sec- ond official run (crnlCl) uses query-by-query choice of parameter values in a predictive (as opposed to retrospective) fashion. The values given in Table 5 were used. Routing Results Both crnlRl and crnlCl do extremely well in comparison with other TREC 2 routing runs: Run Best > median < median crnlRl 7 40 3 crnlCl 5 45 0 Evaluation measures in Table for both the of- ficial and some non-official runs show the im- portance of query expansion. Run 1 is the base case original query only (ltc weights).