IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Suffix Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vI-~
3. [OCRerr]etrieval Performance [OCRerr]esults
Comparisons of the suffix I[OCRerr]I and stem dictionaries are presented
for the three document collections, using the normalized measures, precision
versus recall graphs and data from individual requests. Figure 1 gives ten
results using the normalized recall and precision measures The ADI results
include text, abstract and title results, and some results are displayed
both for the ADI and IRE-3 collections with overlap correlation and logical
vectors. All IRE-3 results and four of the six ADI results show the stem
dictionary to have higher normalized values, although by quite small amounts.
The single Cranfield result and the ADI text cosine and overlap logical runs
show suffix 1g1 to be the superior dictionary.
Four results are given using precision versus recall graphs: II[OCRerr]E-3
Figure 2(a), Cran-l Figure 2(b), ADI Abstracts Figure 3(a) and ADI Text
Figure 3(b). These results confirm those in Figure 1, and the Cran-l
result is seen to favor suffix 151 over the whole range of the curve. To
complete all the runs given in Figure 1 in terms of precision and recall,
a table is given in Figure [OCRerr] that summarizes six more precision/recall plots
not presented in detail, by recording the precision merit at three levels
of recall. Some disagreement between these results and the normalized measures
may be noted, and the reasons tor this are discussed in section II. The
cases of disagreement all consist of very small differences in merit between
suffix 151 and stem, and all the more valuable comparisons which use the
cosine correlation and numeric vectors display consistent results. The aver-
age performance measures show, therefore, that stem is superior to suffix `5'
on the IRE-3 and ADI collections, and that suffix `5' is the better diction-
ary on the Cran-l collection.