IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Suffix Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-3 A second example is the terin "compressible", used in the aerodynamics literature, which is kept separately from "compressibility't. It appears that amendments to the automatic procedures used could solve at least some of these problems, and it is certain that for every such problem there are at least ten cases of correct conflation. Examination of the groups of words that are related by this conflating procedure suggests that the majority are helpful for document retrieval. A distinction between "computer" and "computing" is not believed to be useful, and preservation of the two forms is unlikely to be helpful to a requester. An exception to this situation may be furnished by the inclusion of a noun with the adjec- tival and verbal forms. Although the practice of using a "computer1' is related to the "computer" itself, a request for documents describing one named computer may not perform well if documents describing computational procedures are highly matched with the request. The performance results presented suggest that this type of unwel- came conflation is a contributing factor to the poor performance of the stem dictionary on the Cran-l aerodynamics collection. The words "compressor" and "compressors", for example, are unhelpfully grouped with 1tcompressible11 and "compression", when notions such as "jet engine compressor", t'compressible flow", and 11compression buckling" are quite unrelated. Naturally any hand- produced dictionary, such as the thesaurus dictionaries described in section VII, can easily handle such conflation problems, but the claim for automa- tically generated dictionaries is that cases of failure are few enough to justify the large saving in effort of construction. This general claim seems to be potentially far better justified by the automatically generated thesaurus- type dictionaries produced by statistical association (see section VIII and appendix C), since hand construction of a stem dictionary would requfre little effort if an exhaustive concordance of the collection were available.