SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Multilevel Ranking in Large Text Collections Using FAIRS
chapter
S-C. Chang
H. Dediu
H. Azzam
M-W. Du
National Institute of Standards and Technology
Donna K. Harman
The importance weights were assigned with a minimum of
attention. As described below, failure analysis had later
shown that great improvements in precision could be made
if pre[OCRerr]processing were better tuned with term expansion,
duplicates removal and a heuristic importance measure.
Mter conversion, queries were submitted to the retrieval
engine through a non-interactive interface. The time for
processing (ranking) was approximately 30 minutes per
query. Again, I/O wait was the dominant factor in running
time. The same system solutions which apply to indexing
can be applied with equal effectiveness to query process-
ing. Modification of some of the internal data Structures
for index storage (concordance list optimization) could
also improve ranking running time.
3.4 Retrieval Performance
FAIRS participated in the following categories: Ad-hoc
Wall Street Journal disk 1 (AWl), Ad-hoc Wall Street
Journal Disk 2 (AW2), Routing, Category B [OCRerr]B). Rele-
vance judgements are as follows: AWl: 50 topics, AW2:
50, RB: 25; total 125.
Of these judgements, FAIRS' recall rates were ranked as
follows: 77 on or above median, 45 below median, and 3
undetermined (1 tied, 2 with 0 relevant.) 61.6% on or
above median, 36% below, 2.4% undetermined.
Table 3: Recall Performance Summary
%>= %<
Category med med
AWl 66.0 32.0
AW2 46.0 50.0
RB 84.0 16.0
Total 61.6 36.0
%rell
ret
54~
Beside recall, relevancy judgements made available recall/
precision figures across 11 points of recall rates. The aver-
age of the 11 recall/precision figures is called the 11-pt.
average. FAIRS' 11-pt. average rates were ranked as fol-
lows: 79 on or above median, 43 below median, and 3
undetermined. The percentages for Il-Pt. averages are:
63.2% on or above, 34A% below, 2A% undetermined.
334
Table 4: 11-pt. Average Performance Summary
Category med
med
AWl 85.7 31.0
AW2 45.8 54.2
RB 84.0 16.0
Totals 63.2 34A
3A.1 AWl
Ad-hoc WSJ disk I, results were submitted by three sys-
tems. In this category, of all systems' submissions, 4,056
documents were judged relevant, 10,000 were submitted
by FAIRS in response to 50 queries, of which 1,561 were
among the relevant. The average 11 point average (over 50
queries over 11 recall rates for each query) was 0.2083.
The distribution of relevant retrieved (recall) over the 50
topics was: 20 ranked best, 12 on the median, 16 worst, 2
tied. The distribution of 11-Pt. averages over the 50 topics
was: 22 ranked best, 14 on the median and 13 worst, I
tied.
*Thble 5: AWl RecaII/Predslon Performance
Best Med Worst
Recall 20(40%) 13 (26%) 16(32%)
11-Pt. 22(44%) 14(28%) 12(24%)
Total recall placed FAIRS second among 3 participants,
and first in 11-Pt. averages recall/precision.
3A.2 AW2
Ad-hoc WSJ disk 2, results were submitted by two sys-
tems. In this category 2,172 documents were judged rele-
vant, 10,000 were submitted in response to 50 queries, of
which 1,188 were among the relevant. The 11 point aver-
age precision was 0.2216. The distribution of relevant
retrieved (recall) over the 50 topics was: 17 ranked best,
22 worst, 9 tied, 2 unevaluated. Of the tied, the 11-pt.
averages favored FAIRS 6 times, the other system 3 times.