IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
iv-8
`is
t
zi=1
vww
-i-i-i
[OCRerr]-l viviw[OCRerr][OCRerr] L1 wiwiw[OCRerr]j2[OCRerr]
This correlation coefficient is basically the same as the Cosine
coefficient already described except that within each sun[OCRerr][OCRerr]ation, another
factor has been added. The numerical effect of this added w[OCRerr] factor is
zero since it can be divided out. The effect of the factor is to reduce
the magnitude of the document vector length (the vector v being the
document description vector), since the product ViV4W[OCRerr]i is zero when
v[OCRerr]i> 0 but wi = 0. This term therefore is positive only when both
and w[OCRerr] are greater than 0 , i.e. when concept i is found in both w
and v . In other words, the length of the document vector is calculated
in the subspace of the request space. Since the document vector is usually
much longer than the request vector, the [OCRerr]persine reduces the dependency
on length of the Cosine function.
D) The Overlap Coefficient
In an effort to measure the amount of overlap between two vectors,
the following fornuila wa[OCRerr] proposed
OL =
I mm
min(Z vi ,
(v[OCRerr] , w[OCRerr])
[OCRerr]w[OCRerr])