IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-47
conclusion (comparisons 22 and 25), although ADI abstracts do not.
It is suggested therefore that a thesaurus with weights does have some
additional power, probably due to the precision device effect that has been
illustrated.
Two examples from the other collections are given in Fig. 29. The
ADI request QBlO has a worse than random normalized recall using the stem
dictionary, and the large improvements achieved by thesaurus are due mainly
to the new synonym connection between "computerization" and "computer"
(not confounded by stem), the dropping of the word "system" by making it
a restricted word in the thesaurus, and the very large increase in weight
of important concepts such as "chemistry", due to the synonym groupings.
If a small amount of human intervention in the weighting scheme were per-
mitted, a simple increase of three in the weight of the one vital request
concept "chemistry" would result in a thesaurus result of ranks 1, 2, 3, and
5 for the four relevant. The IRE-3 example shows cases of relevant docu-
ments considerably worsened in rank by the thesaurus. In the case of
documents 200 and 382, for example, the thesaurus provides no increase
in weight to any of the concepts that matched on stem, and furnishes only
one additional matching concept. Also, the word "method" is dropped from
the thesaurusr an apparently sensible decision, but this highly weighted
term matched the request using the stem process, thus helping the result.
These individual examples show that a considerable amount of
variation in individual requests is obscured by the use of averages alone.
This suggests that some method of making an accurate pre-search dictionary
choice would produce good results; attempts to come up with such a method
have, however, not succeeded so far.