SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Design and Evaluation of the CLARIT-TREC-2 System
chapter
D. Evans
R. Lefferts
National Institute of Standards and Technology
D. K. Harman
Table 2 gives the official CLARIT-TREC-2 system
ad-hoc query results as reported by MST. A graph
of the precision-recall curves for the two sets of re-
suits is given in Figure 3. The total number of docu-
ments retrieved under the ad-hoc query task was 8,229
(CLARTM) and 8,109 (CLARTA), representing, respec-
tively, 76.30% and 75.19% of the total known relevants
(10,785).
The graph in Figure 4 shows the average precision
score for each process at N documents, for selected val-
ues of N. It should be noted that the maximum possible
precision score at 500 and 1,000 documents is less than
100%. In particulat, the average number of relevants
per routing topic is 209.78; this corresponds to a maxi-
mum precision of 41.96% at 500 documents and 20.98%
at 1,000 documents. The average number of relevants
per ad-hoc query topic is 215.70; this corresponds to a
maximum precision of 43.14% at 500 documents and
21.57% at 1,000 documents.
Tables 3 and 4 provide another view of total per-
formance. The numbers in each cell give the number
of times the CLARIT-TREC-2 system produced results
above, equal to, or below the median for all TREC-
participant systems. Numbers in brackets give the
instances of `extreme' performance-best and worst-
among all systems. For the routing topics, for example,
CLARIT retrieval results at 1,000 documents were bet-
ter than the median 36 times in both "manual" and
"automatic" modes; CLARIT scored the maximum 10
and 11 times, respectively. For the ad-hoc query topics,
CLARUr retrieval results at 1,000 documents were bet-
ter than the median 44 times in "manual" mode and 42
times in "automatic" mode; CLARIT scored the maxi-
mum 4 and 9 times, respectively.
3.2 CLARIT Automatic vs. Manual Modes of
Processing
In both tasks (routing and ad-hoc querying), CLARIT-
TREC-2 automatic processing results are virtually iden-
tical to manual results. This conf[OCRerr]s our hypothesis
that the principal contribution to performance derives
from (1) the base-level CLARUF process (using linguis-
tic phrases as information units) and (2) the effect of
query augmentation via thesaural terms. On this lat-
ter point, we note that, on average, the final query
vector for a topic will contain many more terms that
derive from thesaurus extraction than terms that de-
rive from the source topic. In general, then, when
reliable information is available (as in sample known
relevants or highiy-likely relevants returned in a first-
pass retrieval), the CLARIT process will succeed in
finding good supplemental terminology for a topic
and the overall effects of manual intervention will be
minumized.1
Figures 5 and 6 illustrate the relative absence of a
positive effect for manual intervention in the selection
and weighting of query terms. There are approximately
as many instances of decreased performance as there
are instances of increased performance. Most topics
show very little percentage difference in numbers of
documents returned;2 this is especially underscored in
the results for routing topics at 1,000 documents.
4 Analysis
4.1 CLARIT Precision
As in TREC-1, CLARIT precision-recall curves demon-
strate very high precision at low percentages of recall.
The first few documents returned by the system are ex-
tremely likely to be relevant for the given topic. This
fact of CLARIT processing was successfully exploited
in the ThEC-2 processing method: query augmenta-
tion was possible because there was, in general, a good
concentration of topic-relevant information among the
sub-documents of the first-pass returned documents.
As shown in Figure 4, precision remains quite stable
for all methods across the first 30 documents retrieved
and is relatively high across the full retrieved set of
1,000 documents.
5 Query Augmentation Experiments
A distinguishing feature of the CLARIT-TREC-2 sys-
tem is the use of fully-automatic query augmentation.
As noted above, the selection of terms for query aug-
mentation depends on (1) the selection of a source set of
known- or nominated-relevant documents and (2) the
application of the CLARIT thesaurus-discovery proc[OCRerr]
dure. Since the size (and quality) of the source set
of documents can vary and since CLARIT thesaurus-
discovery processing can be adjusted to nominate rel-
atively greater or fewer numbers of terms, the `query-
augmentation' facet of the CLARIT process is a natural
source of potential variation in system performance.
10f course, there may be some forms of manual intervention-not
utilized in the CLAJ[OCRerr] `manual' proces[OCRerr]that would have effects
dramaticallybetter than the CLARIT automatic process. We know of
no such process that canbe applied efficiently to arbitrary topics and
databases, however.
m
2[OCRerr]deed, even in the absolute number of documents returned for
each topic-not shown in the figures-there is very little difference.
140