IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Suffix Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-9 These average results may be supplemented by the individual request data given in Figures 5, 6, 7 and 8e Using the normalized recall and pre- cision measures as indicators of merit, it can be seen that 71% to 7[OCRerr]% of the requests favor stem on IRE-3 (Figure 5), and 53% to 75% of the requests favor stem on ADI abstracts (Figure 7) and text (Figure 8)e The Cran-l result favoring suffix `s' is confirmed by figures relating to the individual request also, with 72% to 77% preferring suffjx 1[OCRerr]t, ignoring those requests which have equal merit for both dictionariese Each figure includes plots of both normalized recall and precision versus the individual requestse In the case of Cran-l these plots show that suffix [OCRerr] is superior on the average because many of the requests favor suffix `5' by very small amounts. In the IRE-3 and ADI collections the stem dictionary displays some large changes in individual requests in its superiority over suffix `5'. [OCRerr]. Performance Analyses Two phenomena require explanation: firstly, the IRE and ADI runs involving logical vectors and overlap correlation which sometimes show suf- fix `5' superior to stem; and secondly, the superiority of suffix `5' on the Cran-l collection. The first phenomenon is less important than the second, because logical and overlap runs are inferior to cosine numeric runs in any case. Cases where suffix `5' is better than stem must be caused by circumstances of the type considered in part 2, where full suffix removal conflates some words that match with non-relevant documents and thus adversely affect per- formance. It was noted in section III that the use of numeric vectors (weighted) gives a clear advantage over logical vectors when a dictionary is in use that includes a reasonably large amount of mapping (i.e., it