IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Thesaurus, Phrase and Hierarchy Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. vII-s spurious match, but it gives considerably greater weight to a correct phrase match. Fig. 3 also shows that some new synonymous concepts are produced by phrases, since the related notions of "ultraviolet radiation" and "solar emission" are not properly related in the thesaurus dictionary alone. It is a simple matter to invent examples where this kind of phrase processing can lead to spurious matches, both because thesaurus concept groups are used as phrase components, and because within-sentence occur- rence is the only criterion for recognizing a phrase. However, the document collections in use deal with quite restricted subject areas, and an exam- ination shows that around 90% of the phrases recognized are either completely correct or at least legitimate for retrieval purposes. An example of a legitimate, but not strictly correct, phrase is the recognition of "boundary conditions" in a sentence containing the phrases "boundary layer" and "sur- face conditions" A more reasonable criticism of the phrase procedures is the fact that too few phrases are listed in the dictionaries, as the data in Fig. 4 shows. However, if more cQmplete phrase recognition procedures were used, the size of the phrase dictionaries would vastly exceed the size of the present thesaurus dictionaries, and the co-occurrence recognition procedures to be used would probably have to become more sophisticated than is presently the case. 4. Description of Hierarchy Dictionaries The use of hierarchies provide formal relationships used in processing