IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
`v-il
concept weight in the collection. If the number of concepts in the thesa[OCRerr]irus
is large and the number of concepts in any document description vector is
much smaller, a large number of zero elements will occur in a vector. When
these elements are complemented, all the elements will equal the maximum
concept number. In this case, the summation [OCRerr] will be very large
and
its product with 7v.w.*[OCRerr]vw
Zv[OCRerr] will be much larger than
giving a coefficient which will always be near 1. To avoid this problem,
only non-zero concepts are complemented.
In the ADI collection the maximum document weight is 96 and the
maximum query weight is 1[OCRerr]. The complement for an element in a document
vector or a query vector is respectively
v =
-i 96-v[OCRerr]
if vi or [OCRerr]wi is greater than zero, otherwise the complement is zero.
One further alteration made, in order to avoid negative correlation
coefficients, results in a change in the range of the formula. It has been
adjusted so that the range is from 0 tQ +1 by adding 1 to the unadjusted
coefficient and dividing by 2
F) The Parker-Rhodes-Needham Coefficient
This formula was originally proposed as an index term - index term
association measure for use with binary term vectors. The function is