IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. iv-8 `is t zi=1 vww -i-i-i [OCRerr]-l viviw[OCRerr][OCRerr] L1 wiwiw[OCRerr]j2[OCRerr] This correlation coefficient is basically the same as the Cosine coefficient already described except that within each sun[OCRerr][OCRerr]ation, another factor has been added. The numerical effect of this added w[OCRerr] factor is zero since it can be divided out. The effect of the factor is to reduce the magnitude of the document vector length (the vector v being the document description vector), since the product ViV4W[OCRerr]i is zero when v[OCRerr]i> 0 but wi = 0. This term therefore is positive only when both and w[OCRerr] are greater than 0 , i.e. when concept i is found in both w and v . In other words, the length of the document vector is calculated in the subspace of the request space. Since the document vector is usually much longer than the request vector, the [OCRerr]persine reduces the dependency on length of the Cosine function. D) The Overlap Coefficient In an effort to measure the amount of overlap between two vectors, the following fornuila wa[OCRerr] proposed OL = I mm min(Z vi , (v[OCRerr] , w[OCRerr]) [OCRerr]w[OCRerr])