IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-33 Figures 21 and 22 show, respectively, the number of requests favoring abstract and text using two dictionaries, and magnitude difference plots for the stem dictionary, since stem favors abstracts more than text in Figure 21, using normalized precision. The differences between text and abstract are always small, and usually in favor of text. The precision/recall curves for titles only are added to those abstract and text in Figure 23; the data on individual requests in Figure 2[OCRerr] comparing the three document lengths again shows the expected order of merit. Data for the 170 relevant documents concerned are given in Figure 25. Taking results of the six possible orders of merit for the three document lengths, it is interesting to note that merit orders `1A' and "F11 are observed for more documents than any of the other orders of merit. Documents in A are clearly matched poorly with the request using titles, and the two increases in length improve the match and rank positions of the [OCRerr]7 documents concerned. Documents in F probably match the requests quite well on titles, and increases in document length only serve to increase the matches with non-relevant documents, thus worsening the ranks of these 36 relevant documents. The abstracts came off worse by this evaluation, but text is best for many relevantdocuments. Retrieval runs using full text were also made without the abstracts, although the title was always included. In the results presented here text includes abstract, and this change does provide a slight improvement in performance as the normalized measures in Figure 26 show. Despite this outcome, the ADI abstracts are thought to be rather poor; some are rather short, and do not seem adequately to cover the text for docu- ment retrieval purposes. It is suggested that if better abstracts were available they might have a superior performance (apart from recall ceiling) to full text.