IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-26
"theoretical best"), and another where they are completely the opposite
of the expected pattern(described as "theoretical worst"). It is impor-
tant to notice that the rank positions occupied by the seven relevant
documents are not altered, but that some of the relevant documents exchange
their position so as to obtain the desired relevance grade orders..
With these three stipulated rank orders for each request in the set,
three average precision versus recall curves can be drawn, using methods
described in Section II part 5 in particular. Results for the set used
as an example are given in Fig. 11. The plot shows that in this instance
the ordering by relevance grade seems to be almost random, since the
curve of "actual search result" falls mid-way between the theoretically
best and worst. Further retrieval runs could be tried, but it is not
believed that great differences will be seen when compared with these
simulated situations. Calculation of the curves based on relevance grades
for a thesaurus run has shown that the difference in merit between that
thesaurus dictionary and suffix `5' using the relevance grade scores to
obtain recall is virtually identical to the merit between the two runs
when no relevance grades are allowed. This means that, w[OCRerr]th these two
dictionaries at any rate, it is apparent that one dictionary is not
more effective than another in retrieving relevant documents of particular
relevance grades.
These results are in accord with similar tests made on the same data
in the Cranfield Project £12, page 2153. The conclusion that there is no
strong correlation between degree of relevance and ease in retrieval is
probably due to the difficulty of making the relevance grade judgments in
the first place.