IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
iv-i6
I) The Reitsma-Sagalyn Coefficient
This function is based on the idea that the relative weights of
the matching concepts are very important. In other words, it is more impor-
tant to have weights of matching concepts equal rather than unequal. The
function is
t
zi=l
R-S =
minLv[OCRerr][OCRerr]w.)
maxLv ,w )
i -i
N
where N equals the number of matching concepts. As an alternative, N
may equal the maximum number of concepts found in the document or request
description vector.
The range of this function is 0 to 1 , where 0 indicates no
correlation and 1 indicates perfect correlation.
This function can be used with either binary or weighted vectors.
The main problem with this function is that it depends entirely
upon the relative weights of the matching concepts. As described in the
beginning of this section, the requests are usually much shorter than the
abstract from which the document descriptions are taken. It follows, then,
that the relative weights of the concepts in the request vector do not
indicate the relative importance of the concepts, i.e. many weights tend
to be the same and, therefore, the relative importance of the various concepts
cannot be determined.
It therefore seems reasonable that if this coefficient were used
in a system with a relevance feedback system, it might prove more powerful.
Ideally, it should be used in a system where the requests are such that the
user indicates the relative importance of various ke[OCRerr]words in his request,