IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. iv-i6 I) The Reitsma-Sagalyn Coefficient This function is based on the idea that the relative weights of the matching concepts are very important. In other words, it is more impor- tant to have weights of matching concepts equal rather than unequal. The function is t zi=l R-S = minLv[OCRerr][OCRerr]w.) maxLv ,w ) i -i N where N equals the number of matching concepts. As an alternative, N may equal the maximum number of concepts found in the document or request description vector. The range of this function is 0 to 1 , where 0 indicates no correlation and 1 indicates perfect correlation. This function can be used with either binary or weighted vectors. The main problem with this function is that it depends entirely upon the relative weights of the matching concepts. As described in the beginning of this section, the requests are usually much shorter than the abstract from which the document descriptions are taken. It follows, then, that the relative weights of the concepts in the request vector do not indicate the relative importance of the concepts, i.e. many weights tend to be the same and, therefore, the relative importance of the various concepts cannot be determined. It therefore seems reasonable that if this coefficient were used in a system with a relevance feedback system, it might prove more powerful. Ideally, it should be used in a system where the requests are such that the user indicates the relative importance of various ke[OCRerr]words in his request,