ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-26 (null concon), where associated word stems are added to the original stems available for content identification, and the normal word stem process previously shown in Figs. 6 and 8. For all three subject areas it is seen that the word stem associations improve the recall values for the last few documents retrieved, over and above the values obtainable with the simple word stem matching process. As an example of the performance of the concept-concept associations, consider search request QB2, titled 1ttesting automated information systems", used with the ADI collection. One of the documents in this collection, number 80x, dealing with "experiments on documentation techniques" is th relevant to the request, but is ranked only 77 out of 82 for the regular word stem process, because very few of the words used in the document match the terms of the request. If concept-concept associations are generated, additional related terms such as "efficient", "real", "reduce", "experimental", "frequency", etc. are generated; these added terms provide a bridge between "test" and "experiments", and between "information" and "documentation", thus accounting for the improved perfor- mance. While word-word correlations improve the basic word-stem matching process for high recall values, Fig. 12 shows that a well-constructed thesaurus is more powerful than the associative techniques applied to words. In other words, the thesaurus which serves much the same purpose as the associative process does so more accurately. This leads to the following conclusion: Fule 7 : Statistical concept-concept associations can be used to improve recall performance particularly for collections for which a well ordered synonym dictionary does not exist.