IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Test Environment chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. number of matching terms or some constant threshold correlation coefficient. In an output graph, the effect of stem and thesaurus as a recall device can be seen when a threshold correlation coefficient of, say, 0.35 is applied to the search output. But such an effect cannot be detected in the complete precision versus recall curves that are normally used for evaluation. In particular, it is not correct to say that recall devices will cause the high recall end of the curve to be good, and precision devices will improve the high precision end. The only importance of the devices is that they become the means by which the specificity of the index language is altered; a dictionary that provides optimum specificity for a given test environment will exhibit a precision versus recall curve that is superior to all others probably over the whole performance range. The optimum specificity of index language in the Cranfield project was found to be a stem type language. Such a result is given in a table of normalized recall ratios versus number of terms in language, (see Fig. 15, (11]). A plot of this type is included in Fig. 9, giving SMART results for three collections in addition to the Cranfield Project result. The Cranfield Project normalized recall curve is calculated differently from the SMART measure, so that no significance should be attached to the positions on the plot. The peak point of each curve shows the optimum dictionary; and whereas between 500 and 700 dictibnary concepts produce optimum results for all three SMART collections, the Cranfield Project found that 2,500 was the optimum number. The interpretation of such plots needs further experimentation, since a count of the total concepts in a dictionary does not reflect the presence of a word in more than one concept, and the method employed is biased by collection size.