IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-56
2. The present practice of reducing the weight of ambiguous
terms (where "ambiguous" refers to terms grouped in more
than one place in the thesaurus) s[OCRerr]uld be evaluated.
3. The degree of overlap among t[OCRerr]esaurus groups is at present
kept very low, but one example of a dictionary with a large
amount of overlap produced good performance; an investigation
of this phenomenon is needed.
4. Thesaurus dictionaries using many terms in very few concepts
do not necessarily perform poorly, as was originally be-
lieved. Unpublished results for a version of the ADI
thesaurus-l in which a further grouping of the concepts is
made by statistical association to form approximately 170
concepts, gives a performance somewhat superior to the
thesaurus-l alone. The occasional examples of the value
of IRE-3 hierarchy to individual requests shows that a
broad grouping can sometimes work well. The relevance
feedback results presented in Report ISR-12 show that
very greatly expanded requests can often be used to im-
prove the ranks of initially poorly ranked relevant documents.
These examples point up the need to examine the grouping
problem in depth.
5. An aid to improvements in thesaurus grouping might be the
construction of a thesaurus by "hindsight"; that is,
using information about given relevant documents in re-
lation to their search requests; an optimum thesaurus
might then be made in an attempt to discover more rules
and principles.
6. An operational use of thesaurus-type dictionaries might
be aided by the construction of "near" and "far" synonym
thesauruses. The near synonym thesaurus would only
contain very closely related words, and would always be
used by the system, but the far synonym thesaurus would
include groupings of many words that would be used only
to permit a manual pre-search selection.