SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Automatic Routing and Ad-hoc Retrieval Using SMART: TREC 2
chapter
C. Buckley
J. Allan
G. Salton
National Institute of Standards and Technology
D. K. Harman
relevant documents, and the non-relevant doc-
uments (A,B,C above); and then exactly what
terms are to be considered part of the final vec-
tor.
In TREC 1, a similar approach originally
proposed by Ide was used.[8, 13] That seemed
to work well for the fragmentary relevance in-
formation ava[OCRerr]ahle for TREC 1. TREC 2
has considerably more information available-
and of much higher quality-so Rocchio's ap-
proach is more appropriate.
The best parameter values to use for Roc-
chio's algorithm were investigated by splitting
the TREC 1 collection into two parts along
the natural Dl (TREC 1 initial collection) and
D2 (TREC 1 routing collection) lines. Dl
formed the learning set and D2 the evaluation
set for a large number of experimental runs
determining these parameters. The original
TREC 1 routing queries (Q2) are expanded
and weighted using Rocchio's algorithm with
the relevance information from Dl. They are
then evaluated by running them against D2
and using the known Q2-D2 relevance infor-
mation.
Queries are expanded by adding the "best"
X single terms and the "best" Y phrases to
the original query. We used a simple notion of
"best" for TREC 2: terms that occurred in the
most relevant documents (ties were broken by
considering the highest average weight in the
relevant documents).
There is a core set of 158 runs using differ-
ent parameter values for both expansion and
weighting. Table 4 gives the six parameter
possibilities The trends noticeable in this in-
vestigatory set of runs are:
1. Overall effectiveness increases strongly as
the number of terms added increases, up
until 200 terms at which point it starts to
level off.
2. Phrases are reasonably important (6%
difference) at low single term expansion
numbers, but become less important at
higher values (1% difference)
3. As expected, weights in relevant doc-
uments are far more important than
weights in non-relevant documents.
The parameters of our official run, crnlRl are:
adding X = 300 single terms, adding Y = 50
phrases, importance of original query of A = 8,
51
importance of weight in relevant documents of
B = 16, importance of weight in non-relevant
documents of C = 4, and relative importance
of phrases at retrieval time of P = 0.5.
Q[OCRerr]ery-by-Query Parameter Esti-
mation
We examined the results for the 158 test rout-
ing runs in more detail, query by query. For
each of the 50 queries, we found the best test
run. The results (see Table 5) show some in-
teresting patterns not brought out by the over-
all averages. Not surprisingly, the parameters
used for crnlRl are not best for any single
query; they are just a reasonable compromise.
There seem to be two main groups of queries:
one in which very limited expansion is use-
ful (even 6 queries where no expansion is pre-
ferred); and one in which the more terms ([OCRerr]re
added, the better (23 queries with expansion
of 500 single terms). If massive expansion is
useful, in general the original query is less im-
portant than the expanded terms: A is much
less than B. There is another separate dis-
tinction between those queries where phrases
are useful and those where phrases appear use-
less: 1 query worked best adding 100 phrases,
6 with 50 added, 2 with 10,16 using the orig-
inal phrases only, and 25 using no phrases at
all.
If we retrospectively choose the best param-
eters for each query (something that cannot be
done in practice), then we achieve roughly a
10% improvement. This is substantial enough
to actually try a predictive run, so our sec-
ond official run (crnlCl) uses query-by-query
choice of parameter values in a predictive (as
opposed to retrospective) fashion. The values
given in Table 5 were used.
Routing Results
Both crnlRl and crnlCl do extremely well in
comparison with other TREC 2 routing runs:
Run Best > median < median
crnlRl 7 40 3
crnlCl 5 45 0
Evaluation measures in Table for both the of-
ficial and some non-official runs show the im-
portance of query expansion. Run 1 is the
base case original query only (ltc weights).