IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-~e C) Abstracts versus Indexing Overall performance measures are given in Figures 27 and 28. The indexing is in all cases superior to the abstracts, except in the stem dic- tionary at the very low recall end of the precision/recall curve (Figure 28 a). Indexing has a slightly superior recall ceiling also, as seen in Figure 29. The individual request data and difference plots in Figures 30 and 31 rein- force these results: between 51.3% and 6[OCRerr].i% of the requests are superior on indexing. The superiority of the indexing is small but quite marked, and was observed to be similar in the tests conducted at Cranfield, see Figure 32. A positive explanation as to why the indexing is superior awaits analysis not yet performed, because the effects of two separate factors which differ between the indexing and abstracts cannot be distinguished. The first point relates to the fact that the indexers were free to choose terms out of the whole documents, so that it is expected that the indexing incorporates at least some subject notions that the abstractors did not include. The second factor is the one of primary interest here, namely document length (or indexing exhaustivity), which for the indexing was roughly half that of the abstracts. Both these factors may be observed in the results from the Cran- field Froject, presented in Figure 33. The tables give search results at two coordination levels (corresponding to a demand of two matching keywords) for five different document lengths, the shortest being titles only, then three levels of exhaustivity of indexing, and finally the longest being the abstracts. The indexing results previously examined used the [OCRerr] 311 level. Figure 33 b) shows the indexing which probably included some ideas not in the abstract, since 5 additional relevant documents were found in the indexing as against the abstracts. The effect of document length is seen in