IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Analysis of the Documentation Requests
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
X-34
is expected to result in improvements, but for high precision requirements
hand modified requests may always be superior (at the cost of increased user
effort)e
[OCRerr]he treatment of important request words might be made more drastic,
II
for example by the use of an essential11 word rule, which would only present
to the searcher documents that contain the noted important wordse This
strategy could in fact be achieved by assigning very high weights to the
important concepts (weights of several hundred would be needed in text runs);
alternatively, modifications could be made to the search algorithme It
is almost certain that this procedure would imply that some relevant docu-
ments would never be found, although large increases in precision might
be possible.
For example, for seven requests containing important concepts chosen
at random, only 9% of the relevant items would be lost, and although actual
precision results cannot be calculated, of the 86 non-relevant documents that
were given rank positions above 16 in the output, 32 would be excluded by
this rule. Other requests subsequently examined occasionally produce a much
greater recall ceiling, and also a greater precision improvement, so that this
procedure is worth further experimentation.
6. Performance Effectiveness and Search Procedures
A comparison of the retrieval results obtained in the documentation
collection with the performance of the aerodynamics and computer science
collection shows the documentation results to be quite inferior. Data
concerning this fact were given in Section I, where it was seen that with a
stem dictionary in use, at 0.80 recall, the amount of non-relevant examined