IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. `v-b by making the simple substitution [OCRerr]vi = [OCRerr] + [OCRerr]viw[OCRerr] and a similar substitution for [OCRerr] . The first term in the expression for 6 gives the number of documents containing both terms V and w and the second term is proportional to the frequency of documents both having terms v and w if both v and w were random vectors. For random vectors 6 = 0 giving a value of 0 for the coefficient. For vectors in which there are a greater or smaller number of matching docu- ments the expected number 8 is greater than or less than 0 . The range of the function is then -l [OCRerr] M-K [OCRerr] +1 , +1 signifying perfectly correlated terms and -l signifying perfectly uncorrelated terms. When the Maron-Kuhns coefficient is modified to be used as a docu- ment - document correlation coefficient, its interpretation is altered. The summations must now be taken from i = 1 to d where d equals the number of concepts in the description vector. The formula then gives a measure of the number of concepts found in both document v and document w over and above the number expected if both v and w were random vectors. Further problems arise when the document description vectors are weighted vectors instead of binary vectors. One problem is the question of complementation. To solve this problem, the complement of an element of a vector, is defined as the maximum concept weight found in the entire collec- tion in which that vector is found minus the concept weight to be complemented. A second problem is concerned with all the zero elements of the document description vector. If the above method of complementation were used, the complement of a concept weight of zero would equal the maximum