SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Multilevel Ranking in Large Text Collections Using FAIRS
chapter
S-C. Chang
H. Dediu
H. Azzam
M-W. Du
National Institute of Standards and Technology
Donna K. Harman
Table 6: AW2 Recall/Precision Performance
rn Best Worst [OCRerr] [OCRerr]e
Recall 17(34%) 22(44%)_[OCRerr] 11
Lilt.[_22(44%) 26 (52%)_[ 2
For those queries for which there was a tie in recall values,
there are two queries which had 0 records judged relevant.
Of the remaining 9, we considered the 11-pt. average as a
tie-breaker. The result was 6 best, 3 worst. Combining the
recall and 11-pt. averages, for AW2, FAIRS had 23 sub-
missions on or above the median (46%), 25 below (50%)
(4% undetermined).
3A.3 RB
Routing, Category B results were submitted by 7 systems.
For those topics judged, 3,766 documents were considered
relevant, 5,000 were submitted by FAIRS in response to
25 queries. Of those submissions, 1,124 were among the
relevant.
The distribution of relevant retrieved (recall) over the 25
topics was: 2 ranked best, 13 ranked above the median, 6
on the median, 4 below, for a total of 21 on or above the
median, 4 below.
The following graph illustrates the performance index of
the recall rates of FAIRS compared to the group. It shows
FAIRS is above average most of the time. The average
recall P! of FAIRS is 65.8.
1 3 5 7 9 11 13 15 i7 19 21 23 25
Recall [OCRerr]Qiaery
The next graph shows the performance index of the 11-
point average of FAIRS compared to the group. It again
shows FAIRS to be above average most of the time. The
average 11-point-average P1 of FAIRS is 61.8.
100
50
Table 7: RB RecalVprecision Performance
I
[OCRerr]tion to Median > I = 1
Recall [OCRerr]_15(60%) 6(24%) :4(16%) 1
11-pt. [OCRerr] 15(60%) 6(24%) 4(16%)]
This is the only group that had enough participants to
make a comparative-performance analysis meaningful. We
compared our 11-point average and recall rates for each
query to the best, the median, and the worst scores of that
query. The performance index (P!) is calculated as fol-
lows:
____ (score-median[OCRerr] score > median
50+50 I
P1 = bess - median
score - w:orsrs:s),
score <median
P! has the property that a value of 100 means the best is
achieved, and a 50 means the performance is on the
median, and a 0 means it is the worst.
335
0
i 3 5 7 9 11 13 15 17 19 21 23 25
11-point average IQuery
3.5 Failure Analysis
Based on the feedback from relevance judgements, we are
considering several improvements in the query handling
and ranking methods. These changes include:
1. Expanding of terms in the topics which are abbreviated
via an abbreviation dictionary. Initial investigation of
topics which have abbreviations reveals that those
abbreviations had an appreciably negative impact on
recall. Topic 17 is a good example, where the term
"United States" is abbreviated as "U.S.", In a later trial,
this simple expansion alone significantly improved the
recall rate for this topic.
2. Using better term weighting based on heuristics. Up to
50% improvement was observed when term weighting
was modified more intultively (by hand.)