IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
[OCRerr]) These conclusions are further supported by the almost equivalent
values of the standard deviations of the respective functions.
5) The other functions show a performance strictly below the above
mentioned coefficients. Cnly the Cverlap coefficient approaches
the three best and only above 0.75 recall which region is fairly
insignificant in practice.
However, the four best functions, when tested with the Cranf ield
Collection, exhibit a different behavior:
1) The differences between the functions have increased.
2) The Cosine function shows a better performance than the other
three (i.e. the Parker-Rhodes-[OCRerr]eedham, Stiles, and Overlap coef-
ficients).
3) The ?arker-Rhodes-Needham is not close to the Cosine anymore;
it is the worst of the four.
14) The performance of the Overlap is no longer the worst, in fact,
it remains very close to the Cosine and Stiles coefficient.
5) The standard deviation of the Cosine function is much smaller
than for the other functions. This supports the conclusion that
this function is better than the rest in this collection.
6) The overall precision at the same recall is lower in the Cran-
field collection than in the ADI collection.
6. Discussion
In this section, an attempt is made to explain the behavior of the
various coefficients and to suggest possible modifications f[OCRerr]r future inves-
tigat ions.
Cosine:
The Cosine function shows a consistently high performance in both the
ADI and Cranfield collections. Since it is length dependent and since the
Hypersine tries to reduce this dependence unsuccessfully, a compromise some-