IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
number of matching terms or some constant threshold correlation coefficient.
In an output graph, the effect of stem and thesaurus as a recall device can
be seen when a threshold correlation coefficient of, say, 0.35 is applied
to the search output. But such an effect cannot be detected in the complete
precision versus recall curves that are normally used for evaluation. In
particular, it is not correct to say that recall devices will cause the high
recall end of the curve to be good, and precision devices will improve the
high precision end. The only importance of the devices is that they become
the means by which the specificity of the index language is altered; a
dictionary that provides optimum specificity for a given test environment
will exhibit a precision versus recall curve that is superior to all others
probably over the whole performance range.
The optimum specificity of index language in the Cranfield project
was found to be a stem type language. Such a result is given in a table of
normalized recall ratios versus number of terms in language, (see Fig. 15, (11]).
A plot of this type is included in Fig. 9, giving SMART results for three
collections in addition to the Cranfield Project result. The Cranfield Project
normalized recall curve is calculated differently from the SMART measure,
so that no significance should be attached to the positions on the plot. The
peak point of each curve shows the optimum dictionary; and whereas between
500 and 700 dictibnary concepts produce optimum results for all three SMART
collections, the Cranfield Project found that 2,500 was the optimum number.
The interpretation of such plots needs further experimentation, since a count
of the total concepts in a dictionary does not reflect the presence of a word
in more than one concept, and the method employed is biased by collection size.