IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-42
3. The initial versions of a thesaurus, and dictionaries
without the construction rules are inferior to revisions
and versions made using the rules; and in two out of seven
comparisons, the performance of the initial thesaurus
versions is not as good as the stem process (comparisons
8 and 12 worse than stem; comparisons 1, 3, 6, 9 and 13
superior to stem).
4. The thesaurus superiority is not always preserved when
less than optimal document length and matching function
parameters are used; thus, in twelve comparisons, three
are inferior to stem (comparisons 22, 25, and 26 inferior;
comparisons 4, 5, 7, 11, 14, 15, 20, 24, and 27 superior).
5. For users needing high precision with only one or two
relevant documents, the thesaurus is little better than
stem on IRE-3, but in Cran-l and ADI, a larger superiority
for the thesaurus is evident (see Figs. 14, 15, and 16).
6. For users with a very high recall need, IRE-3 produces
a good improvement for the thesaurus over stem, but in
Cran-l and ADI only a very small gain is seen, using the
average rank of the last relevant document as a measure
(Figs. 14 and 15).
7. The thesaurus-SAl on ADI, made by the semi-automatic
rules, does not provide a good performance. It is in
all cases inferior to the ADI regular thesaurus-l, and
in four of five comparisons it is also inferior to stem
(Comparisons 17, 19, 23 and 26 inferior; comparison 21
superior).
Results comparing the thesaurus with the addition of phrases
are as follows:
1. Phrase dictionaries give a superior performance compared with
thesaurus alone by a very small amount only on IRE-3 and ADI,
and on Cran-l the thesaurus alone gives a slightly better
result (Figs. 19 and 21).