Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-23 documentse Figure 9 gives the numbers of requests that favor abstracts and the number that favor titles, using both normalized recall and normalized precision, for six results. The data given reflect the fact that the pre- cision/recall curves in Figures 6 and 7 closely represent the actual situa- tion, namely, that abstracts are superior to titles, since between 57e9% and 94.1% of the requests favor abstracts on the six runs, using normalized recall (ties being ignored). The superiority of abstracts is again most evident with the computer science collection, and least so in the aerodynamics collection. Since the aerodynamics resultin Figure 6 produces a crossing curve, two plots are given in Figure 10 of the normalized recall and normalized precision values for each of the 42 requests, showing the magnitude of the differences, comparing the 30 requests that favor abstracts and the 10 that favor titles. For example, one request had a normalized recall difference of 0.34 between abstracts and titles, while another request was better by 0.08 on titles than abstracts. The requests are arranged in an order of decreasing dif- ference, and it is seen, using both normalized recall and normalized pre- cision, that although ten requests did perform better on titles, there are ten requests that performed better on abstracts with a larger increase in performance. This result does not explain the superiority of titles over a small range in the middle of the precision recall curve seen in Figure 6 b, so that further data are given in Figures 11 and 12 to explain this fact. In these tables, the individual relevant documents are examined, and the ranks of the 198 documents concerned are compared on abstracts and titles. Figure 11 shows that 99 are superior on abstracts, and 84 on titles, a close result that accurately describes the situation. Figure 12 further breaks down these 99 and 84 documents, showing by a series of 10 ranges, the difference in rank positions for the 99 superior on abstracts, and the 84 superior