NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Multilevel Ranking in Large Text Collections Using FAIRS chapter S-C. Chang H. Dediu H. Azzam M-W. Du National Institute of Standards and Technology Donna K. Harman The importance weights were assigned with a minimum of attention. As described below, failure analysis had later shown that great improvements in precision could be made if pre[OCRerr]processing were better tuned with term expansion, duplicates removal and a heuristic importance measure. Mter conversion, queries were submitted to the retrieval engine through a non-interactive interface. The time for processing (ranking) was approximately 30 minutes per query. Again, I/O wait was the dominant factor in running time. The same system solutions which apply to indexing can be applied with equal effectiveness to query process- ing. Modification of some of the internal data Structures for index storage (concordance list optimization) could also improve ranking running time. 3.4 Retrieval Performance FAIRS participated in the following categories: Ad-hoc Wall Street Journal disk 1 (AWl), Ad-hoc Wall Street Journal Disk 2 (AW2), Routing, Category B [OCRerr]B). Rele- vance judgements are as follows: AWl: 50 topics, AW2: 50, RB: 25; total 125. Of these judgements, FAIRS' recall rates were ranked as follows: 77 on or above median, 45 below median, and 3 undetermined (1 tied, 2 with 0 relevant.) 61.6% on or above median, 36% below, 2.4% undetermined. Table 3: Recall Performance Summary %>= %< Category med med AWl 66.0 32.0 AW2 46.0 50.0 RB 84.0 16.0 Total 61.6 36.0 %rell ret 54~ Beside recall, relevancy judgements made available recall/ precision figures across 11 points of recall rates. The aver- age of the 11 recall/precision figures is called the 11-pt. average. FAIRS' 11-pt. average rates were ranked as fol- lows: 79 on or above median, 43 below median, and 3 undetermined. The percentages for Il-Pt. averages are: 63.2% on or above, 34A% below, 2A% undetermined. 334 Table 4: 11-pt. Average Performance Summary Category med med AWl 85.7 31.0 AW2 45.8 54.2 RB 84.0 16.0 Totals 63.2 34A 3A.1 AWl Ad-hoc WSJ disk I, results were submitted by three sys- tems. In this category, of all systems' submissions, 4,056 documents were judged relevant, 10,000 were submitted by FAIRS in response to 50 queries, of which 1,561 were among the relevant. The average 11 point average (over 50 queries over 11 recall rates for each query) was 0.2083. The distribution of relevant retrieved (recall) over the 50 topics was: 20 ranked best, 12 on the median, 16 worst, 2 tied. The distribution of 11-Pt. averages over the 50 topics was: 22 ranked best, 14 on the median and 13 worst, I tied. *Thble 5: AWl RecaII/Predslon Performance Best Med Worst Recall 20(40%) 13 (26%) 16(32%) 11-Pt. 22(44%) 14(28%) 12(24%) Total recall placed FAIRS second among 3 participants, and first in 11-Pt. averages recall/precision. 3A.2 AW2 Ad-hoc WSJ disk 2, results were submitted by two sys- tems. In this category 2,172 documents were judged rele- vant, 10,000 were submitted in response to 50 queries, of which 1,188 were among the relevant. The 11 point aver- age precision was 0.2216. The distribution of relevant retrieved (recall) over the 50 topics was: 17 ranked best, 22 worst, 9 tied, 2 unevaluated. Of the tied, the 11-pt. averages favored FAIRS 6 times, the other system 3 times.