IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-19
5. Experlinental Results
The following functions have been tested with the use of the ADI
collection, making four comparisons in each test as follows:
Table 1 - Overlap, Cosine, Parker-Rhodes-Needham, Reitsma-Sagalyn.
Table 2 - Average, Stiles, Reitsma-Sagalyn (sorted up), Cosine.
Table 3 - Cosine, I[OCRerr]rpersine, Maron-Kuhns, Reitsma-Sagalyn (modified).
with the Cranfield collection:
Table 4 - Overlap, Cosine, Parker-Rhodes-Needham, Stiles.
The tables contain averages, from which the average recall - precision graphs
were made, and the standard deviation (S.D.D.) of the averages.
The data in the tables are summarized in Figure 1 which shows
the performance of all coefficients tested on the ADI collection.
Recalling the discussion of the recall and precision measures as a
means for evaluating the performance of different correlation coefficients,
Figure 1 shows the following [OCRerr]utpute[OCRerr]
1) Three correlation functions exhibit a decidedly better perfor-
mance than the others. They have been replotted on a larger scale
on Figure 3 to show the difference in behavior in more detail.
The functions are Stiles, Cosine and Parker-Rhodes-Needham.
2) In the recall interval 0-0.50 the Parker-Rhodes-Needham
coefficient has a better performance than the other two; in the
recall interval above 0.50 the performance of this function
is worse than the others. This indicates that the Parker-Rhodes-
Needham function gives the best results in a system with a cutoff
value smaller than 0.50.
3) Comparing the Cosine and Stiles coefficients, the former has a
better performance below 0.55 recall, while at higher recall
values, the performance of both functions is almost identical.
Therefore, in the entire interval, the Cosine coefficient is better
than the Stiles function.