Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-28 judgment, and using coordination level matching (which is virtually identical to overlap-logical matching on SMART) obtained at Cranfield are given in Figure 17. The differences between test techniques in the case of SMART and Cranfield reside in the dictionaries used (Cranfield word forms language is similar, but not identical, to SMART stem dictionary), and also in the methods of calculating the average recall/precision curve[OCRerr]. It is seen that this last matter is still a partly unsolved problem, since the two Cranfield plots presented are not totally consistente Figure 17 b) comes closest to the methods used by SMART, and comparison with Figure 13 shows a similar result except at the low recall end. It would seem that the addition of a weighting scheme, as used in SMART, does not help the title performance much, but does improve the abstracts, so that in circumstances where such weighting may be practiced even the Cranfield results do show a reasonable superiority of abstracts over titles. B) Abstracts versus Full Text Overall performance measures are given in Figures 18 and 19. Seven comparisons of abstracts and full text are given in Figure 18 using the nor- malized measures, and two comparisons using precision/recall curves in Figure 19. In all cases the full text is superior to abstract alone, but the dif- ference is always small. The precision/recall curves do cross over at the high recall end with stem and at the high precision end with thesaurus Figure 19), but this is due in the former case to the fact that 10% of the relevant documents have zero correlation with the request when abstracts are used, and the raxiks assigned to these documents are higher than the ranks given when full text is in use. The recall ceiling data are given in Figure 20, where it is seen that high ceilings are present, with the expected superiority of full text.