ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-22
less of any actual syntactic relation between the components; alternatively,
the presence of a phrase may be inferred whenever the components are
located within the same sentence of a given document, rather than merely
within the boundaries of the same document; finally, even more stringent
restrictions can be imposed before a phrase is actually accepted, by
chec![OCRerr]ing that a pre-established syntactic relation actually exists between
the phrase components in the document under consideration.
In the SMART syst[OCRerr][OCRerr], the phrase dictionaries are based on co-occurrences
of thesaurus concepts, rather than text words, and t[OCRerr] principal strategies
are used for phrase detection: the so-called 11statistical phrase' dictionary
is based on a phrase detection algorithm which ta[OCRerr]es into account only the
statistical co-occurrence characte:r.istics of the phrase components; speci-
fically a statistical phrase is recognized, if and only if all phrase
components are present within a given document or [OCRerr]ri thin a given sentence
of a document, and no attempt is made to detect any particular syntactic
relation between the components; on the other hand, the "syntactic phraseT'
dictionary includes not only the[OCRerr]specification of the particular phrase
components which are to be detected, but also information about the permissible
syntactic dependency relations which must obtain if the phrase is to be
recognized. Thus, if it were desired to recognize the relationship between
the concept "program" and the concept 11language", then any possible combina-
tion of these two concepts such as, for example, "programming language",
"languages and programs't, 1'linguistic programs", would be recognized as
proper phrases in the statistical phrase dictionary; in the syntactic
dictionary, on the other hand, an additional restriction would consist in
requiring that the concept corresponding to `tprogram" be syntactically
dependent on the concept "language". This eliminates phrases such as