IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-7 B) The Cosine Coefficient This function was proposed by Salton and has the following form t 7 vw i=l t t (v[OCRerr])2 . [OCRerr] Lwi)2 l i=l i=l i It is used as a term - term association coefficient as well as a document - document correlation coefficient. In both cases its interpretation is the same. If V and w are t-dimensional vectors, then C is the direction cosine in the term space or document space of the angle subtended by the vectors V and w . The interpretation also does not depend on the type of vectors used, whether they be binary or weighted. Since the denominator is the product of the absolute lengths of the vectors in t-space, it increases with an increase in the vector length. If the two vectors are increased in length, the inner product will increase by an amount equal to or less than the denominator. Since the possible nunber of matching concepts tends to increase with increased vector length and since the cosine correlation generally decreases, this function has at least one serious fault, i.e. length dependency. C) The f[OCRerr]persine Coefficient This function was proposed in the work of Hall and [OCRerr]nning and is designed to reduce the length dependency of the cosine function. The }[OCRerr]persine function is