SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
The ConQuest System
chapter
P. Nelson
National Institute of Standards and Technology
D. K. Harman
To check out the coarse-grain rank, we constructed graphs
which more clearly shows its performance. Since fine-grain
can only work on the results of the coarse-grain algorithm,
what is the 1055 in recall for coarse-grain?
The following graph shows the cumulative recall percentage
as documents are retrieved from coarse-grain rank. Every
time a relevant document is retrieved, the recall percentage
gradually inches up towards 100%. Note: these tests were
run on just the Category B data.
100%
90%
80%
70%
[OCRerr] 60%
a)
2 50%
[OCRerr] 40%
30%
[OCRerr] 20%
a)
[OCRerr] 10%
0%
100%
95%
- - -
.90%
- -
85% -
2 80% --[OCRerr]------------------------------
L 75%
70%
[OCRerr] 65%
m[OCRerr] 60%
,[OCRerr] 55%
50%
Documents Retrieved by Coarse[OCRerr]Grnin Rank
Figure 5 Cumulative Recall as Documents are Retrieved
using Coarse-Grain Rank
This figure is an average over all queries. The average
strongly correlates with the results from query #110. This
verifies the two discoveries identified above.
Documents Retrieved from
Coarse-Grain Rank
Figure 4 Cumulative Recall Percentage
for Query #110
Figure 4 shows two exciting discoveries. The first is that
the coarse-grain performance achieves over 95% recall. This
strongly contradicts our initial fears that coarse-grain was
not retrieving enough relevant documents.
The second discovery is that the high recall figures are
achieved quickly. This implies that ConQuest can retrieve
fewer documents (greatly improving speed) and still achieve
high accuracy.
To further establish these claims, we repeated the analysis
on all queries in the TREC-2 topic set, then averaged the
results together, as shown in the next graph:
Some initial studies also more clearly show the difference
between fine-grain and coarse-grain sorting of documents.
The following figure shows both graphs superimposed:
90
50
70
60
w
-w
a,-
a,
[OCRerr] E 50
~oWooO40
a,
[OCRerr] 30
[OCRerr] 20
E
[OCRerr] 10
0
(I-
________ Coarse Grain Sorting
Fine Grain Sorting
Documents Retrieved from
Coarse[OCRerr]rnin Rank
Figure 6 Coarse-Grain Sorting vs. Fine-Grain Sorting
for TREC-2 Topic #135
In this diagram, we see that fine-grain sorting is in fact
better than coarse-grain. In other queries, the results are
more mixed. Clearly, the difference is not as great as was
initially assumed.
This suggests that the area where ConQuest can most
improve is not in the coarse-grain ranking algorithm, but
rather in improving the fine-grain algorithm, or providing a
better combination of the two.
Upon further study, we believe we now know why. When
the fine-grain algorithm was developed, the programmers
assumed an average query length of about 5 words. Studies
of typical users indicate that their preferred query type is a
269