IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v-28
judgment, and using coordination level matching (which is virtually identical
to overlap-logical matching on SMART) obtained at Cranfield are given in
Figure 17. The differences between test techniques in the case of SMART
and Cranfield reside in the dictionaries used (Cranfield word forms language
is similar, but not identical, to SMART stem dictionary), and also in the
methods of calculating the average recall/precision curve[OCRerr]. It is seen that
this last matter is still a partly unsolved problem, since the two Cranfield
plots presented are not totally consistente Figure 17 b) comes closest
to the methods used by SMART, and comparison with Figure 13 shows a similar
result except at the low recall end. It would seem that the addition of a
weighting scheme, as used in SMART, does not help the title performance much,
but does improve the abstracts, so that in circumstances where such weighting
may be practiced even the Cranfield results do show a reasonable superiority
of abstracts over titles.
B) Abstracts versus Full Text
Overall performance measures are given in Figures 18 and 19. Seven
comparisons of abstracts and full text are given in Figure 18 using the nor-
malized measures, and two comparisons using precision/recall curves in Figure
19. In all cases the full text is superior to abstract alone, but the dif-
ference is always small. The precision/recall curves do cross over at the
high recall end with stem and at the high precision end with thesaurus
Figure 19), but this is due in the former case to the fact that 10% of the
relevant documents have zero correlation with the request when abstracts are
used, and the raxiks assigned to these documents are higher than the ranks
given when full text is in use.
The recall ceiling data are given in Figure 20, where it is seen
that high ceilings are present, with the expected superiority of full text.