IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Analysis of the Documentation Requests
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
X-2~
actually performed poorly (two "A'1 and one "B").
The performance results in Figure i4 compare the "A11 and `1B'1 requests
for two dictionary runs each made on two different document lengths, text
and abstracts. The expected superiority of the 11A11 requests is seen in the
stem dictionary results, but with the thesaurus, the "B'1 requests perform
slightly bettere Since the "B" requests are quite inferior to "A" with the
stem dictionary, this does leave a greater opportunity for improvement with
the thesaurus; the initial inferiority on stem requires an explanation,
however, as well as the fact that the thesaurus does not much improve the
"A" requests. It is difficult to isolate any fundamental reasons for this
result, because individual problems with both the stem and thesaurus dic-
tionaries seem primarily to be the cause, as the foliowing example shows.
Request BlO, with a normalized recall of 0.8205 with thesaurus and
0.3718 with stem, has four documents assessed as relevant, and the thesaurus
produces improvements in rank positions of 22, 26, 32, and 6o compared with
stem. Reasons for the superiority of thesaurus in this case are:
a) The thesaurus provides additional matching concepts between
the request and all four relevant documents, including the impor-
tant concept "computer". The stem dictionary fails to match this
concept, because the suffix routine used does not conflate all word
forms, and "computation" is separated from "compute" and 11computer11.
b) The thesaurus does not contain "system" but regards it as a coimnon
word to be ignored, and although the stem dictionary uses it and
establishes matches with all four relevant documents, this high
frequency word also establishes matches with many non-relevant
documents.
c) The very important request concept "chemistry" is grouped with
synonyms in the thesaurus which successfully increase the weight