ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-23 `tlinguistic programs !T, and "languages and programs", but would permit the phrases "programming languages", or "programmed languages". A typical excerpt from a statistical phrase dictionary used in connection [OCRerr][OCRerr]th the SMART system is shown in Fig. 5. It may be seen that up to six phrase components are permitted in a given phrase, but that the usual phrase specification consists of two, or at most three, components. With each phrase included in Fig. 5 is listed a phrase concept number which replaces the individual component concepts in a given document specification whenever the corresponding phrase is detected by the phrase processing algorithm in use. For example, the first line of Fig. 5 shows that a phrase with concept number 543 is detected whenever the concepts [OCRerr]44 and 6o8 are jointly present in the document under consideration. [OCRerr]`henever such a phrase concept is attached to a given document specification, the weight Cf the phrase concept can be increased over and above the original weight of the component concepts to give the phrase specification added importance. Since the phrase components used in the S[OCRerr]'\RT system represent concept numbers rather than individual words, a given phrase concept number does then in fact represent many different types of English word combinations depending of course on the number of word stems assigned to each component concept by the original thesaurus mapping. The syntactic phrase dictionary has a more complicated structure as shown by the excerpt reproduced as Fig. 6. Here, each syntactic phrase also known as a "criterion tree" or "criterion phrase1', consists not only of a specification of the component concepts, but also of syntactic indica- tors, as well as of syntactic relations which may obtain between the