IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-44
also to consider cases where the thesaurus worsens performance. Retrieval
results from four of the Cran-l requests are given in Fig. 28, using the
suffix `5' dictionary and the thesaurus-3 dictionary. Requests Q79 and Q225
have an overall superiority on thesaurus, and requests Q167 and Q323 prefer
suffix `5'. The thesaurus improvement for documents 436 and 437 in Q79 is
reflected in the size of the correlation coefficient; this is due to some
thesaurus produced matches between request and document, when suffix `5'
produced no matches at all. In Q225, document 07F is improved in rank by
37 places using the thesaurus, because the request contains a hyphenated
phrase "Boundary-layer" which was matched by the thesaurus with the occur-
rence of the component words occurring separately in the document. This
is an instance where the suffix `5' or stem dictionary could cope with
the problem of hyphens were disregarded. The superiority of thesaurus
over stem would then be reduced from 0.0193 to 0.0053 in normalized recall,
and 0.0248 to 0.0141 normalized precision (Cran-l) when hyphens are
removed.
A quite different way in which the thesaurus improves performance
is illustrated by document 655 in Q225; this item increases by 17 posi-
tions in rank. Both suffix `5' and thesaurus provide three matches between
request and document concepts, but the match with the concept "Boundary-
layer" receives a weight of 5 with the thesaurus and only 1 with suffix `5'.
The numeric vector weighting produced by the thesaurus thus proves ef-
fective in this case; and the thesaurus with weights in fact acts as a
precision device. In fact, document 569 and 572 are improved by the
thesaurus for the same reason, thus showing that the thesaurus proves