IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-9 It was originally proposed for binary term vectors, where the summations are taken over 1=1,.. .,t , where t equals the number of documents in the collection. Without any modifications, the function may be used with weighted vectors, in which case the summations are taken over i=l,. . .,d , where d equals the number of concepts in the description vector. The numerator of the function is the smallest vector in the docu- ment space consisting of elements from V and w . It is divided by the smallest vector, either V or w . In the case of weighted vectors, `tsmallest'1 means the least sum of weights. E) The Maron-Kuhns Coefficient This formLlla was originally proposed as a measure of association between index terms. Used with binary vectors, it measures the number of matching terms for two given term description vectors over and above the number of matching terms expected for purely random vectors. The formula is Zvi!i 9 W[OCRerr]v¼wi - >V[OCRerr][OCRerr]i . [vi!i M-K - 1v[OCRerr]wi . 7ViXi + 7Vj[OCRerr]i * Z[OCRerr]viXi where the symbol X[OCRerr] is the complement of Xi , that is, if X[OCRerr] = 0 , Xi = 1 and if Xi = 1 , Xj = 0. All summations are taken from i = 1 to t , where t equals the number of documents. The vectors V and w are binary term description vectors. The numerator can be written as t3 where 7v[OCRerr] 7wi t = [OCRerr]viwi -