NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System chapter W. Croft J. Callan J. Broglio National Institute of Standards and Technology D. K. Harman This resulted in considerably better retrieval performance. Additional experiments using manually edited queries are discussed in the next section. Query Type Average Precision 5 Does 30 Does 100 Does INQOO3 INQ004 .64 .56 .45 .67 (+3.7%) .58 (+2.7%) .45 (0%) 11-Pt Avg .35 .36 (+2.4%) Table 2: Results for Routing queries The routing results show that some improvement is obtained by combining the manual queries with the queries that were automatically modified using relevance feedback tech- niques. The difference in performance between the two types of queries is considerably less than last year, however. Our own experiments have also shown that no additional gains in performance were obtained by using more than the top 150 documents from the INQUERY output. This is a significant result from a practical viewpoint, since in an operational envi- ronment we will not want to rely on having output from other systems or need thousands of relevance judgements before performance improves. 5 Other Experiments In the TIP STER 24 month evaluation, which took place soon after the TREC-2 evaluation, we did a number of experiments that complement those done in TREC. In particular, we evaluated paragraph-based retrieval, expansion using an automatically generated thesaurus, feedback techniques that use phrases, and Japanese indexing techniques. In this section, we report some of the most interesting results. The precision figures given here are calculated using the TREC-2 relevance judgements, rather than the TIP STER judgements. The first two experiments were with adhoc queries. INQO41 (the numbers are consistent with those used in TIP STER and other publications) is a run that used a different manually modified version of INQOOl. That is, the manual modifications were the same as those done in the first TIP STER and TREC evaluations, rather than the more restricted modifications done for INQOO2. INQO42 is a run that combines INQO41 with INQOOl. Query Type Average Precision 5 Does 30 Does 100 Does 11-Pt Avg INQO41 INQO42 .68 .60 .50 .36 .65 (-4.6%) .61 (+1.7%) .51 (+2.0%) .38 (+5.6%) Table 3: Results for TIPSTER adhoc queries These results show that the manually modified queries can achieve significantly better precision at low recall levels. For example, at the 5 document cutoff level, the average precision for INQO41 is 9.7% higher than INQOOl. The overall average is the same, however. This is a much smaller difference than was seen in the first TREC and TIPSTER evaluations 81