ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-20
A typical suffix dictionary for inglish suffixes may contain about
20C entries. To simplify the look-up algorithm, noun suffixes may be
entered in the plural as well as singular forms, and adjectival suffixes
may also be listed in the adverbial form. Verb suffixes should include
the co[OCRerr]nion endings lied?, 1'ingt[OCRerr], and 1?5il[OCRerr] as well as true verb suffixes
such as [OCRerr] with their inflected forms. (Multiple suffixes, such as
11i[OCRerr]ring1t could be detected by a dual scanning of the suffix list, looking
first for 11ing11 and then for fyi?; a dual scan is avoided if such multiple
suffixes are also entered in the suffix dictionary.)
In general, it is possible to encode word tems and suffixes in such
a way that no ambiguity results when the fragments are combined into full
words. For example, the stem `1recti'1 is coded as a [OCRerr]otential verb because
it can form 11rectifyt1; the stem 11reductt1, on the other hand, is carried
[OCRerr]Tithout syntax codes, since it can be combined only with common suffixes
such as 1tion and "ible'1 which by themselves are carried as complete homo-
graphc, representing respectively ttnoun singular" and Tadjectivet1.
In a limited number of cases, partial syntactic coding may introduce
an ambiguity: if the word capital11, for example, is coded as a potential
verb to accept the suffix 11ize", the plural noun "capitals'T will receive
the extraneous coding of a verb in the third person singular. This
difficulty may be prevented by entering the stem "capit11 with a partial
verb code. The suffix `1als'1 properly carries with it only the plural
noun code, and 11capitalizet1 can then be found by a double scan of the
suffix list.[2]