IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-15 The use of the factor l[OCRerr] is intended to simulate the original function. In essence, dividing by l[OCRerr]4 partially eliminates the effect of weights and therefore approximates binary terms. The definition of N presents some problems. Originally, it was intended to let N equal the number of concepts in the thesaurus, about 610. However, if this were done, it is possible that the last two factors in the denominator might become negative. Therefore, to avoid this problem. N is defined as (4)(610), the ([OCRerr]) being the average concept weight divided by 12, the base of the weighting system. ([OCRerr]8 was arbitrarily chosen as the average concept weight.) The coefficient is assured of being real, and no atten[OCRerr]t to normalize it has been made, so that values greater than 1 are possible. H) The Average Coefficient This formula simply calculates the average weight of all those concepts which are found in both description vectors v and w . The for- mula is I AV = 2*N where 1 if both Xi and !i>o 0 if either or Xi = 0 and where N equals the number of matching concepts. The sunmiation is taken from i = l,...,d , where d equals the number of concepts in the description vector. It was originally intended to use this function, time permitting, to determine whether it is more important to have fewer matching concepts at higher weights than it is to have more matching concepts at lower weights.