ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-22 less of any actual syntactic relation between the components; alternatively, the presence of a phrase may be inferred whenever the components are located within the same sentence of a given document, rather than merely within the boundaries of the same document; finally, even more stringent restrictions can be imposed before a phrase is actually accepted, by chec![OCRerr]ing that a pre-established syntactic relation actually exists between the phrase components in the document under consideration. In the SMART syst[OCRerr][OCRerr], the phrase dictionaries are based on co-occurrences of thesaurus concepts, rather than text words, and t[OCRerr] principal strategies are used for phrase detection: the so-called 11statistical phrase' dictionary is based on a phrase detection algorithm which ta[OCRerr]es into account only the statistical co-occurrence characte:r.istics of the phrase components; speci- fically a statistical phrase is recognized, if and only if all phrase components are present within a given document or [OCRerr]ri thin a given sentence of a document, and no attempt is made to detect any particular syntactic relation between the components; on the other hand, the "syntactic phraseT' dictionary includes not only the[OCRerr]specification of the particular phrase components which are to be detected, but also information about the permissible syntactic dependency relations which must obtain if the phrase is to be recognized. Thus, if it were desired to recognize the relationship between the concept "program" and the concept 11language", then any possible combina- tion of these two concepts such as, for example, "programming language", "languages and programs't, 1'linguistic programs", would be recognized as proper phrases in the statistical phrase dictionary; in the syntactic dictionary, on the other hand, an additional restriction would consist in requiring that the concept corresponding to `tprogram" be syntactically dependent on the concept "language". This eliminates phrases such as