IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-9
It was originally proposed for binary term vectors, where the summations
are taken over 1=1,.. .,t , where t equals the number of documents in the
collection.
Without any modifications, the function may be used with weighted
vectors, in which case the summations are taken over i=l,. . .,d , where d
equals the number of concepts in the description vector.
The numerator of the function is the smallest vector in the docu-
ment space consisting of elements from V and w . It is divided by the
smallest vector, either V or w . In the case of weighted vectors, `tsmallest'1
means the least sum of weights.
E) The Maron-Kuhns Coefficient
This formLlla was originally proposed as a measure of association
between index terms. Used with binary vectors, it measures the number of
matching terms for two given term description vectors over and above the
number of matching terms expected for purely random vectors. The formula is
Zvi!i 9 W[OCRerr]v¼wi - >V[OCRerr][OCRerr]i . [vi!i
M-K -
1v[OCRerr]wi . 7ViXi + 7Vj[OCRerr]i * Z[OCRerr]viXi
where the symbol X[OCRerr] is the complement of Xi , that is, if X[OCRerr] = 0 ,
Xi = 1 and if Xi = 1 , Xj = 0. All summations are taken from i = 1 to t ,
where t equals the number of documents. The vectors V and w are binary
term description vectors.
The numerator can be written as t3 where
7v[OCRerr] 7wi
t
= [OCRerr]viwi -