IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Suffix Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI-3
A second example is the terin "compressible", used in the aerodynamics
literature, which is kept separately from "compressibility't.
It appears that amendments to the automatic procedures used could
solve at least some of these problems, and it is certain that for every such
problem there are at least ten cases of correct conflation. Examination
of the groups of words that are related by this conflating procedure suggests
that the majority are helpful for document retrieval. A distinction between
"computer" and "computing" is not believed to be useful, and preservation
of the two forms is unlikely to be helpful to a requester. An exception to
this situation may be furnished by the inclusion of a noun with the adjec-
tival and verbal forms. Although the practice of using a "computer1' is
related to the "computer" itself, a request for documents describing one
named computer may not perform well if documents describing computational
procedures are highly matched with the request.
The performance results presented suggest that this type of unwel-
came conflation is a contributing factor to the poor performance of the stem
dictionary on the Cran-l aerodynamics collection. The words "compressor"
and "compressors", for example, are unhelpfully grouped with 1tcompressible11
and "compression", when notions such as "jet engine compressor", t'compressible
flow", and 11compression buckling" are quite unrelated. Naturally any hand-
produced dictionary, such as the thesaurus dictionaries described in section
VII, can easily handle such conflation problems, but the claim for automa-
tically generated dictionaries is that cases of failure are few enough to
justify the large saving in effort of construction. This general claim seems
to be potentially far better justified by the automatically generated thesaurus-
type dictionaries produced by statistical association (see section VIII and
appendix C), since hand construction of a stem dictionary would requfre little
effort if an exhaustive concordance of the collection were available.