IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Test Environment chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 1-26 "theoretical best"), and another where they are completely the opposite of the expected pattern(described as "theoretical worst"). It is impor- tant to notice that the rank positions occupied by the seven relevant documents are not altered, but that some of the relevant documents exchange their position so as to obtain the desired relevance grade orders.. With these three stipulated rank orders for each request in the set, three average precision versus recall curves can be drawn, using methods described in Section II part 5 in particular. Results for the set used as an example are given in Fig. 11. The plot shows that in this instance the ordering by relevance grade seems to be almost random, since the curve of "actual search result" falls mid-way between the theoretically best and worst. Further retrieval runs could be tried, but it is not believed that great differences will be seen when compared with these simulated situations. Calculation of the curves based on relevance grades for a thesaurus run has shown that the difference in merit between that thesaurus dictionary and suffix `5' using the relevance grade scores to obtain recall is virtually identical to the merit between the two runs when no relevance grades are allowed. This means that, w[OCRerr]th these two dictionaries at any rate, it is apparent that one dictionary is not more effective than another in retrieving relevant documents of particular relevance grades. These results are in accord with similar tests made on the same data in the Cranfield Project £12, page 2153. The conclusion that there is no strong correlation between degree of relevance and ease in retrieval is probably due to the difficulty of making the relevance grade judgments in the first place.