IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Suffix Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-l~ conflates many words), and a similar, but unexplained, relationship is noted when the use of cosine is compared to overlap. From a strictly experimental viewpoint dictionaries such as suffix `5' and stem should be compared without the addition of weighting procedures and cosine, in order that the dictionary mapping characteristics may be tested alone. In this case, the overlap logi- cal results show that stem and suffix `S' dictionaries perform very similarly, and therefore within the context of the requests and relevance decisions in use, no advantage should be gained from full suffix recognition as per- formed automatically. This finding is in accordance with the general con- clusions of the second Aslib-Cranfield project [8], although in those results the nearest equivalent to the stem dictionary does perform a little better than suffix `5'. However, a more practical conclusion in the case of SMART is that stem is the superior dictionary on the IRE-3 and ADI collections, since the cosine correlation and numeric vectors have clearly been proved to be ad- vantageous, and would be advocated for use in any operational version of SMART. The superiority of suffix `5' on Cran-l is one of several instances where the Cran-l result differs from the other collections. In the case of Cran-l the difference in word mapping between suffix `5' and stem is less marked than in the other collections, since Figure 9 shows that the Cran-l stem dictionary includes 8[OCRerr] of the concept classes contained in suffix `5', whereas the IRE-3 and ADI stem dictionaries are based on more mapping characteristics, including only 76% and 74% of suffix `5', respectively. As expected, this affects the match with requests and documents, since Figure 10 shows that at a cosine correlation cut-off of 0.35, the stem dictionary in Cran-l does not retrieve so many additional documents over suffix `5' than is true for the other collections.