IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Analysis of the Documentation Requests
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
x-28
[OCRerr] the negative statement was removed; in requests A8 and Bli the diffi-
culties caused by common words used in a technical sense prompted selection
of one or two synonyms for the given words; and in two requests keypunching
errors which preserved hyphenated words were correctede These six modifi-
cations are all thought to represent reasonable demands that would be made
to users of an operational systeme
These 21+ requests are now processed together with the 12 requests
for which no modification was made; they are described as "}[OCRerr]B[OCRerr]d Modified";
a total of 36 results because request Al is split into two. Comparison of
retrieval performance of the modified with the original unmodified requests
is made for six retrieval runs in Figures 16 and 17. All precision recall
curves for the hand modified requests show them to be superior over the whole
performance range, with increases in precision at most recall values of more
than 5%, and in the middle recall ranges of nearly 10%.
Using the Abstract thesaurus result for analysis, the six requests
that were quite severely modified did not perform very well, only Bil was
notably improved, and some of the others received a worse performance. Of
the seventeen requests that had triply weighted important words, ten were
improved, five has a worse performance, and two remained unaffected. Four
of the ten that were improved are shown in Figure 18, and the two that were
worsened by the greatest amounts are given in Figure 19, with rank positions
for all the relevant and normalized measures.
It is of interest to note that at present these hand modifications
do produce a superior result to the relevance feedback process described
elsewhere (3]. Figure 20 includes a comparison, using an evaluation tech-
nique that differs from the plots in Figures 16 and 17 in order to achieve
a fully user-oriented evaluation (4,5). Further work on relevance feedback