IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-12 P-R-N = [OCRerr]v. + Tyw. 1w[OCRerr] 1[OCRerr]v[OCRerr]w[OCRerr] where all summations are taken over i = 1 to t , and where t equals the number of documents in the collection. Since the term vectors are binary, the interpretation of the terms in the denominator is simple. The first term is the number of documents containing term v , the second is the number of documents containing term w , and the third is the number of documents containing both terms V and w On the whole, the denominator gives the number of documents containing at least one of the terms. For two identical terms, the denominator equals the numerator and the association is 1 . For two independent terms, where a document does not contain both terms, the numerator is zero and the association is 0 When term - term associations are calculated, all the terms are usually compared with all the other terms at the same time, using matrix [OCRerr]iltiplication. The result is a matrix whose elements are terms of the above formula. Since matrix multiplication requires the calculation of many inner products, each of the entries in the association matrix is the result of an inner product and therefore, so is each term in the P-R-N for- mula. Thus, the summations Zv[OCRerr] and Z w[OCRerr] are in practice calculated by v.v and w[OCRerr]w which is the same as [OCRerr]v.v. and [OCRerr]w.w. which is LJ-1-1 the same as and , where the summations are taken