IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Thesaurus, Phrase and Hierarchy Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VII-56 2. The present practice of reducing the weight of ambiguous terms (where "ambiguous" refers to terms grouped in more than one place in the thesaurus) s[OCRerr]uld be evaluated. 3. The degree of overlap among t[OCRerr]esaurus groups is at present kept very low, but one example of a dictionary with a large amount of overlap produced good performance; an investigation of this phenomenon is needed. 4. Thesaurus dictionaries using many terms in very few concepts do not necessarily perform poorly, as was originally be- lieved. Unpublished results for a version of the ADI thesaurus-l in which a further grouping of the concepts is made by statistical association to form approximately 170 concepts, gives a performance somewhat superior to the thesaurus-l alone. The occasional examples of the value of IRE-3 hierarchy to individual requests shows that a broad grouping can sometimes work well. The relevance feedback results presented in Report ISR-12 show that very greatly expanded requests can often be used to im- prove the ranks of initially poorly ranked relevant documents. These examples point up the need to examine the grouping problem in depth. 5. An aid to improvements in thesaurus grouping might be the construction of a thesaurus by "hindsight"; that is, using information about given relevant documents in re- lation to their search requests; an optimum thesaurus might then be made in an attempt to discover more rules and principles. 6. An operational use of thesaurus-type dictionaries might be aided by the construction of "near" and "far" synonym thesauruses. The near synonym thesaurus would only contain very closely related words, and would always be used by the system, but the far synonym thesaurus would include groupings of many words that would be used only to permit a manual pre-search selection.