Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-2~ on titles. It can be seen that 28 of the documents superior on abstracts have improved rank positions (compared with titles) by 76 to 150 places, thus [OCRerr]xplaining why many of the requests do work better on abstracts. Since a lsrge number of documents exhibit quite significant improvements in rank on titles compared with abstracts, however, the results that show superiority of titles in the middle of the precision recall curve seem quite reasonable. The results presented so far have all been based on the cosine cor- relation and numeric vector matching procedure, which is generally superior to simpler procedures. Results are given in Figures 13 to 16 based on unweighted vectors (logical) using the overlap correlation, comparing titles and abstracts with the stem, Cranfield collection. For this process, the title match is superior by a small amount at the high precision end of the curve, below 0.65 recall, and this result is also reflected in the normalized measures (Figure 13). This same precision superiority is seen in the number of requests favoring abstracts and titles in Figure 14, where using normalized recall, the abstracts are superior, but using normalized precision the titles do better. The difference curve also given in Figure l[OCRerr] shows that using normalized precision all but 2 of the 24 requests performing better with titles do so by a greater difference than the 18 which are better on the abstracts. Figures 15 and 16 give data for the 198 individual relevant documents involved, `showing that 13 relevant documents changed rank by over 100 places in favor of' abstracts, but that more documents changed a smaller number of places in favor of titles than abstracts. These figures are presented in order to show that there is no incon- sistency betwee[OCRerr] the results on abstracts and titles obtained with SMART and thoseobtained on the Cranfield Project (2). The results of searches on the same titles and abstracts, using the same collection, requests and relevance