IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Suffix Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI-i
VI. Suffix Dictionaries
E. M. Keen
1. Introduction
The use of suffix removal procedures as a simple method of vocabulary
control is investigated with two types of suffix dictionaries. The need for
vocabulary control and the desirability of synonym and partial synonym
recognition are discussed in Section I. A suffix removal procedure has been
incorporated into the SMART system from its inception, which has been known
as the ?1null thesaurus [OCRerr] but is here described as the stem dictionary. A
second type of dictionary recently te3ted is the !l[OCRerr]uff[OCRerr] 15t dictionaryt1,
since this offers the most basic language analysis method involving virtually
no vocabulary control; as such, the suffix [OCRerr] method provides a convenient
"base-1ine11 from which dictionaries exerting greater control can be evaluated.
A brief description of the two dictionaries will be given, together with
retrieval performance comparisons and an analysis of the results.
2. Description of Suffix Dictionaries.
Both the suffix `5' and stem dictionaries are automatically generated,
and the suffix removal procedure and collection lock-up operations have
been described elsewhere [1,2,3,4,5,6). Briefly, the full suffix removal
process (stem dictionary) is carried out in two stages: first, the construction
of a dictionary of word stems, formed by applying a hand-made list of suffixes
to a body of text; and second, by a look-up process which uses the dictionary
of word stems plus certain spelling rules to reduce the documents texts to