IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vII-s
spurious match, but it gives considerably greater weight to a correct
phrase match. Fig. 3 also shows that some new synonymous concepts are
produced by phrases, since the related notions of "ultraviolet radiation"
and "solar emission" are not properly related in the thesaurus dictionary
alone.
It is a simple matter to invent examples where this kind of phrase
processing can lead to spurious matches, both because thesaurus concept
groups are used as phrase components, and because within-sentence occur-
rence is the only criterion for recognizing a phrase. However, the document
collections in use deal with quite restricted subject areas, and an exam-
ination shows that around 90% of the phrases recognized are either completely
correct or at least legitimate for retrieval purposes. An example of a
legitimate, but not strictly correct, phrase is the recognition of "boundary
conditions" in a sentence containing the phrases "boundary layer" and "sur-
face conditions"
A more reasonable criticism of the phrase procedures is the fact
that too few phrases are listed in the dictionaries, as the data in Fig. 4
shows. However, if more cQmplete phrase recognition procedures were used,
the size of the phrase dictionaries would vastly exceed the size of the
present thesaurus dictionaries, and the co-occurrence recognition procedures
to be used would probably have to become more sophisticated than is presently
the case.
4. Description of Hierarchy Dictionaries
The use of hierarchies provide formal relationships used in processing