Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval An Analysis of the Documentation Requests chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. x-28 [OCRerr] the negative statement was removed; in requests A8 and Bli the diffi- culties caused by common words used in a technical sense prompted selection of one or two synonyms for the given words; and in two requests keypunching errors which preserved hyphenated words were correctede These six modifi- cations are all thought to represent reasonable demands that would be made to users of an operational systeme These 21+ requests are now processed together with the 12 requests for which no modification was made; they are described as "}[OCRerr]B[OCRerr]d Modified"; a total of 36 results because request Al is split into two. Comparison of retrieval performance of the modified with the original unmodified requests is made for six retrieval runs in Figures 16 and 17. All precision recall curves for the hand modified requests show them to be superior over the whole performance range, with increases in precision at most recall values of more than 5%, and in the middle recall ranges of nearly 10%. Using the Abstract thesaurus result for analysis, the six requests that were quite severely modified did not perform very well, only Bil was notably improved, and some of the others received a worse performance. Of the seventeen requests that had triply weighted important words, ten were improved, five has a worse performance, and two remained unaffected. Four of the ten that were improved are shown in Figure 18, and the two that were worsened by the greatest amounts are given in Figure 19, with rank positions for all the relevant and normalized measures. It is of interest to note that at present these hand modifications do produce a superior result to the relevance feedback process described elsewhere (3]. Figure 20 includes a comparison, using an evaluation tech- nique that differs from the plots in Figures 16 and 17 in order to achieve a fully user-oriented evaluation (4,5). Further work on relevance feedback