IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-32
and ADI give the same result. The exceptions are the stem dictionary on
abstracts and titles Cran-l, and suffix `5' dictionary on abstracts and text
ADI. Figures 18, 19, 20 and 21 present precision versus recall graphs for
the stem and thesaurus dictionaries on the IRE-3 collection (Figure 18),
the Cran-l collection (Figure 19), and the ADI collection using text (Figure 20)
and abstracts (Figure 21). General merit strongly favors numeric, the only
exceptions being the low recall high precision area on Cran-l Stem, and the
small differences in the curves on ADI abstract stem. Since the normalized
measures for both recall and precision show ADI test suffix `5[OCRerr] to prefer
logical vectors, a precision versus recall graph of this output together with
ASI abstracts suffix `5' is given in Figure 22. The graphs show numeric to
be superior on both plots up to o.8 recall; the difference in merit obtained
by the normalized measures compared with the graphs of standard measures
is considered in Section II.
Comparisons of individual request merit are given in Figure 23,
76.5% to 88.2% of the requests favor numeric on IRE-3, 51.14% to 77.8%
numeric on Cran-l, and 145.[OCRerr]% to 65.[OCRerr] favor numeric on ADI.
where
favor
C) Analysis of Performance
The thesaurus dictionaries show a better improvement for numeric
over logical than the stem and suffix `5' dictionaries; a specific reason for
this is suggested by the data in Figure 24. Using four ADI dictionaries
and the ADI text results, it seems that numeric gives the best increases in
performance over logical with dictionaries that contain few concept classes.
The dictionary with the smallest number of classes is an exception to this
for four of the performance measures used, this dictionary, however, has a
performance that is inferior to the stem dictionary thus explaining the dis-
crepancy. The grouping of words achieved by a thesaurus provides a greater