ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-23
`tlinguistic programs !T, and "languages and programs", but would permit the
phrases "programming languages", or "programmed languages".
A typical excerpt from a statistical phrase dictionary used in
connection [OCRerr][OCRerr]th the SMART system is shown in Fig. 5. It may be seen that
up to six phrase components are permitted in a given phrase, but that the
usual phrase specification consists of two, or at most three, components.
With each phrase included in Fig. 5 is listed a phrase concept number
which replaces the individual component concepts in a given document
specification whenever the corresponding phrase is detected by the phrase
processing algorithm in use. For example, the first line of Fig. 5 shows
that a phrase with concept number 543 is detected whenever the concepts
[OCRerr]44 and 6o8 are jointly present in the document under consideration.
[OCRerr]`henever such a phrase concept is attached to a given document specification,
the weight Cf the phrase concept can be increased over and above the original
weight of the component concepts to give the phrase specification added
importance.
Since the phrase components used in the S[OCRerr]'\RT system represent concept
numbers rather than individual words, a given phrase concept number does
then in fact represent many different types of English word combinations
depending of course on the number of word stems assigned to each component
concept by the original thesaurus mapping.
The syntactic phrase dictionary has a more complicated structure as
shown by the excerpt reproduced as Fig. 6. Here, each syntactic phrase
also known as a "criterion tree" or "criterion phrase1', consists not only
of a specification of the component concepts, but also of syntactic indica-
tors, as well as of syntactic relations which may obtain between the