IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v-58
is included in the ir[OCRerr]exing, but the abstract mentions only the more generic
ideas of "surface't and "walls' respectively, and the dictionaries in use do
not make the necessary connections. Another example of this type is a request
involving 11transonic", where the indexer included that word from the text of
the document, but the abstractor just used the more specific notion t'Mach
0.6 - 1e6"e This is basically a difficult synonym recognition problem.
An analysis has also been made to attempt to find important subject
ideas that the indexers omitted but the abstractors included, but few examples
were founde One case is the concept of `1Computing (time)'1, mentioned in the
abstract, but not in the indexing. It must be concluded that the main reason
for the superiority of the indexing is that the indexers did a better job
of making a pre[OCRerr]is of the full text than did the abstractors, at least in
relation to the search requests tested. The indexers both selected subject
notions that the abstractors missed, and also made shorter precis, which
prevented retrieval of non-relevant documents and thus increased precision.
6. Conclusions
A simplified summary of the precision recall curves, the normalized
measures and numbers of individual requests favoring a given option is presented
in Figure 39. Conclusions may be enumerated as follows:
a) The use of very short documents, namely, titles only, is unsatis-
factory in all collections for users requiring high recall. Recall
ceilings are 0.71 (Documentation), 0.78 (Aerodynamics) and 0.[OCRerr]
(Computer Science).
b) The use of titles only for users requiring high precision per-
formance is inferior to abstracts in all tests on the IRE-3 Collection