IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-45
superior not always through the introduction of additional matching
concepts
Cases of the superiority of suffix `5' over thesaurus are also
shown in Fig. 28, Q167 and Q323. For example, relevant document 916
matches with five request concepts for both suffix `5' and thesaurus;
but since the thesaurus process fails to match with any additional
request concepts, and also provides no increase in the weight of any
of the matching concepts, document 916 is relegated in rank by non-
relevant documents such as 728. In the case of 728, which matches one
concept on suffix `5' only, the thesaurus provides additional matching
concepts; also since 728 is a short document, it produces a high cosine
correlation coefficient and receives the first rank position. In Q323,
non-relevant document 316 is matched by four concepts with the thesaurus,
and although the thesaurus establishes two additional matches with rele-
vant document 34A, this is not sufficient to prevent non-relevant
documents from occupying the top rank positions.
These examples from the Cran-l collection lead to the question
of whether the lessened superiority of thesaurus over stem compared
with IRE-3 and ADI is due to a poor thesaurus dictionary or to something
in the Cran-l test environment. Evidence strongly points to the latter
reason. Cran-l has real user relevance decisions that, on inspection,
provide a severe test environment and use relevance decisions that
sometimes bear little relation to the stated request. The superiority
of suffix `5' over the stem dictionary is not found on IRE-3 and ADI;
in Section V, the reason for this is stated to be the terminology employed