ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-22 A phrase generation process which does not use a complete syntactic analysis of the phrase components may be expected to lead to many "false phrases", where components are combined which do not belong together (such as "information retrieval't in the sentence t1for people in need of information retrieval is imperative"). The experimental evidence, reflected in the relatively poor performance of the syntactic process, makes it appear that such occurrences are very rare. This leads to Rules 14 and 5: RLile 14 : Absolute accuracy in the analysis of every single item is not so important as the accumulation of a maximum number of correctly analyzed items. If a choice exists be[OCRerr][OCRerr]en a method which can produce one guaranteed correct content indication (syntactic analysis), and another which produces five indicators of which four are probably correct (statistical phrase process), the second is generally to be preferred. Thile 5 : Simple phrase generation methods lead to a definite improvement in recall at the expense of some initial los[OCRerr] in precision in the low recall region. D) Statistical Association Methods Statistical association methods are those which use the co-occurrence frequency of two words, or two dictionary concepts, within a given document collection as an indication of a relationship between them.[15,161 Thus, if two given terms co-occur in many of the documents of a collection, or in many sentences within a given document, a non-zero correlation coefficient can be computed as a function of the number of co-occurrences. If this coefficient is sufficiently high, the two terms can be grouped, and can be assigned jointly to documents and search requests. Associative methods are