SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System
chapter
W. Croft
J. Callan
J. Broglio
National Institute of Standards and Technology
D. K. Harman
This resulted in considerably better retrieval performance. Additional experiments using
manually edited queries are discussed in the next section.
Query Type Average Precision
5 Does 30 Does 100 Does
INQOO3
INQ004
.64 .56 .45
.67 (+3.7%) .58 (+2.7%) .45 (0%)
11-Pt Avg
.35
.36 (+2.4%)
Table 2: Results for Routing queries
The routing results show that some improvement is obtained by combining the manual
queries with the queries that were automatically modified using relevance feedback tech-
niques. The difference in performance between the two types of queries is considerably less
than last year, however. Our own experiments have also shown that no additional gains in
performance were obtained by using more than the top 150 documents from the INQUERY
output. This is a significant result from a practical viewpoint, since in an operational envi-
ronment we will not want to rely on having output from other systems or need thousands
of relevance judgements before performance improves.
5 Other Experiments
In the TIP STER 24 month evaluation, which took place soon after the TREC-2 evaluation,
we did a number of experiments that complement those done in TREC. In particular, we
evaluated paragraph-based retrieval, expansion using an automatically generated thesaurus,
feedback techniques that use phrases, and Japanese indexing techniques. In this section, we
report some of the most interesting results. The precision figures given here are calculated
using the TREC-2 relevance judgements, rather than the TIP STER judgements.
The first two experiments were with adhoc queries. INQO41 (the numbers are consistent
with those used in TIP STER and other publications) is a run that used a different manually
modified version of INQOOl. That is, the manual modifications were the same as those done
in the first TIP STER and TREC evaluations, rather than the more restricted modifications
done for INQOO2. INQO42 is a run that combines INQO41 with INQOOl.
Query Type Average Precision
5 Does 30 Does 100 Does 11-Pt Avg
INQO41
INQO42
.68 .60 .50 .36
.65 (-4.6%) .61 (+1.7%) .51 (+2.0%) .38 (+5.6%)
Table 3: Results for TIPSTER adhoc queries
These results show that the manually modified queries can achieve significantly better
precision at low recall levels. For example, at the 5 document cutoff level, the average
precision for INQO41 is 9.7% higher than INQOOl. The overall average is the same, however.
This is a much smaller difference than was seen in the first TREC and TIPSTER evaluations
81