IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Thesaurus, Phrase and Hierarchy Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI 1-6 Descriptions of the methods used by SMART have previously appeared in [2,3,5,6,8,9,10,11]. No studies have yet been made of full-scale phrase recognition, and the "statistical phrase" technique used is intended only to remove cases of single word ambiguity. For example, a hypothetical medical request on "swine fever in New Guinea will be quite strongly matched, using a thesaurus, with a document dealing with "diseases of the guinea pig". The use of a phrase dictionary containing "New Guinea" would give strong weight to the occurrence of both "New" and "Guinea" in a sentence, and thus the spurious match with "Guinea" in the sense of "guinea pig" would receive less weight by comparison. The phrase dictionaries tested are handmade, and are based on the thesaurus groups. Phrase recognition takes place if the two or more component words (thesaurus concept numbers) appear in the same sentence; no specific word order position or syntactical relation is demanded. Phrases are used in retrieval as an addition to the thesaurus dictionary; thus, when a phrase occurs, a new concept identifier is added to the thesaurus concepts already assigned to the request or document, or the weight of an existing concept identifier is increased. These procedures may be clarified by the excerpt from a thesaurus and phrase dictionary given in Fig. 3. The phrase made up from the thesaurus groups containing "axial" and "symmetry" is of value because the word axial" is more commonly to be found in conjunction with "com- pressor"; thus, without phrase processing, any document dealing with "axial compressors" that also contains a concept identifier such as "regular" or "uniform" could be matched with a request for "axial symmetry". The addition of phrase processing in this example does not prevent such a