IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Suffix Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-i VI. Suffix Dictionaries E. M. Keen 1. Introduction The use of suffix removal procedures as a simple method of vocabulary control is investigated with two types of suffix dictionaries. The need for vocabulary control and the desirability of synonym and partial synonym recognition are discussed in Section I. A suffix removal procedure has been incorporated into the SMART system from its inception, which has been known as the ?1null thesaurus [OCRerr] but is here described as the stem dictionary. A second type of dictionary recently te3ted is the !l[OCRerr]uff[OCRerr] 15t dictionaryt1, since this offers the most basic language analysis method involving virtually no vocabulary control; as such, the suffix [OCRerr] method provides a convenient "base-1ine11 from which dictionaries exerting greater control can be evaluated. A brief description of the two dictionaries will be given, together with retrieval performance comparisons and an analysis of the results. 2. Description of Suffix Dictionaries. Both the suffix `5' and stem dictionaries are automatically generated, and the suffix removal procedure and collection lock-up operations have been described elsewhere [1,2,3,4,5,6). Briefly, the full suffix removal process (stem dictionary) is carried out in two stages: first, the construction of a dictionary of word stems, formed by applying a hand-made list of suffixes to a body of text; and second, by a look-up process which uses the dictionary of word stems plus certain spelling rules to reduce the documents texts to