ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-20 A typical suffix dictionary for inglish suffixes may contain about 20C entries. To simplify the look-up algorithm, noun suffixes may be entered in the plural as well as singular forms, and adjectival suffixes may also be listed in the adverbial form. Verb suffixes should include the co[OCRerr]nion endings lied?, 1'ingt[OCRerr], and 1?5il[OCRerr] as well as true verb suffixes such as [OCRerr] with their inflected forms. (Multiple suffixes, such as 11i[OCRerr]ring1t could be detected by a dual scanning of the suffix list, looking first for 11ing11 and then for fyi?; a dual scan is avoided if such multiple suffixes are also entered in the suffix dictionary.) In general, it is possible to encode word tems and suffixes in such a way that no ambiguity results when the fragments are combined into full words. For example, the stem `1recti'1 is coded as a [OCRerr]otential verb because it can form 11rectifyt1; the stem 11reductt1, on the other hand, is carried [OCRerr]Tithout syntax codes, since it can be combined only with common suffixes such as 1tion and "ible'1 which by themselves are carried as complete homo- graphc, representing respectively ttnoun singular" and Tadjectivet1. In a limited number of cases, partial syntactic coding may introduce an ambiguity: if the word capital11, for example, is coded as a potential verb to accept the suffix 11ize", the plural noun "capitals'T will receive the extraneous coding of a verb in the third person singular. This difficulty may be prevented by entering the stem "capit11 with a partial verb code. The suffix `1als'1 properly carries with it only the plural noun code, and 11capitalizet1 can then be found by a double scan of the suffix list.[2]