Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-58 is included in the ir[OCRerr]exing, but the abstract mentions only the more generic ideas of "surface't and "walls' respectively, and the dictionaries in use do not make the necessary connections. Another example of this type is a request involving 11transonic", where the indexer included that word from the text of the document, but the abstractor just used the more specific notion t'Mach 0.6 - 1e6"e This is basically a difficult synonym recognition problem. An analysis has also been made to attempt to find important subject ideas that the indexers omitted but the abstractors included, but few examples were founde One case is the concept of `1Computing (time)'1, mentioned in the abstract, but not in the indexing. It must be concluded that the main reason for the superiority of the indexing is that the indexers did a better job of making a pre[OCRerr]is of the full text than did the abstractors, at least in relation to the search requests tested. The indexers both selected subject notions that the abstractors missed, and also made shorter precis, which prevented retrieval of non-relevant documents and thus increased precision. 6. Conclusions A simplified summary of the precision recall curves, the normalized measures and numbers of individual requests favoring a given option is presented in Figure 39. Conclusions may be enumerated as follows: a) The use of very short documents, namely, titles only, is unsatis- factory in all collections for users requiring high recall. Recall ceilings are 0.71 (Documentation), 0.78 (Aerodynamics) and 0.[OCRerr] (Computer Science). b) The use of titles only for users requiring high precision per- formance is inferior to abstracts in all tests on the IRE-3 Collection