SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
CLARIT TREC Design, Experiments, and Results
chapter
D. Evans
R. Lefferts
G. Grefenstette
S. Handerson
W. Hersh
A. Archbold
National Institute of Standards and Technology
Donna K. Harman
>Median =Median <Median
lipt Average 33 [3] 3 13 [2]
Rels in Top 100 31 [4] 6 12 [1]
Rels in Top 200 33 [3] 4 12 [1]
Note: An Average of 0.53 Total Relevants were in the "A" 2000
Table 1: Summary of Results for Routing (1-50)
>Median =Median <Median
lipt Average 34 [7] 2 14 [0]
Rels in Top 100 32 [7] 3 15 [0]
Rels in Top 200 31 [5] 2 17 [0]
Note: An Average of 0.39 Total Relevants were in the "A" 2000
Table 2: Summary of Results for Ad-Hoc Queries (51-100)
Except where otherwise indicated, all reported results and all analyses in the following
sections are based on the CLARIT "B" results.
5.1 General Summary of Results
Tables 1 and 2 give the results of applying the techniques described above to the routing
topics, 1-50, and to the ad-hoc query topics, 51-100. The numbers in each cell give the
number of times the CLARIT-TREC system produced results above, equal to, or below the
median for all TREC-participant systems. Numbers in brackets give the instances of `extreme'
performanc[OCRerr]best and worst-among all systems.
For the routing topics, the quick partitioning of documents (which produced our "A" set of
2000 candidate documents per topic out of the approximately 300,000 possible documents in
the second data set) captured 53% of all the documents judged relevant by the TREC judges.
These candidate sets were then processed by the baseline CLARIT-TREC system and ranked
results were produced. The results of this ranking were better than the median results of all
systems tested at TREC for more than 30 of the topics, according to the measures of average
precision ("lipt Average"), the number of relevants in the top 100 returned, and the number
of relevants in the top 200 returned. For three or four topics, CLARIT's results were the best
of all systems tested in TREC for routing.
For the ad-hoc queries, the "A" sets of 2000 candidate documents per query contained 39%
of documents considered as relevant by the human judges, yet results of the discrimination
phase of CLARIT provided even better results. Seven times over the 50 queries, CLARIT
processing produced the best ranking of all systems tested in terms of average precision and in
terms of the number of relevant documents in the first-100 documents returned.
273