IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-7
B) The Cosine Coefficient
This function was proposed by Salton and has the following form
t
7 vw
i=l
t t
(v[OCRerr])2 . [OCRerr] Lwi)2 l
i=l i=l i
It is used as a term - term association coefficient as well as a document -
document correlation coefficient. In both cases its interpretation is the
same. If V and w are t-dimensional vectors, then C is the direction
cosine in the term space or document space of the angle subtended by the
vectors V and w . The interpretation also does not depend on the type of
vectors used, whether they be binary or weighted.
Since the denominator is the product of the absolute lengths of
the vectors in t-space, it increases with an increase in the vector length.
If the two vectors are increased in length, the inner product will increase
by an amount equal to or less than the denominator. Since the possible
nunber of matching concepts tends to increase with increased vector length
and since the cosine correlation generally decreases, this function has at
least one serious fault, i.e. length dependency.
C) The f[OCRerr]persine Coefficient
This function was proposed in the work of Hall and [OCRerr]nning and is
designed to reduce the length dependency of the cosine function. The }[OCRerr]persine
function is