IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-2~
where between the Cosine and H[OCRerr]ersine may prove effective is length depen-
dency inhibits the efficiency of the Cosine function.
H[OCRerr]ersine:
It is seen that the performance of the H[OCRerr]ersine function is worse
than the Cosine. Therefore, it seems as though the non-matching concepts of
the document which were deleted in calculating the document vector length
are indeed important. Evidently, some degree of length dependency is bene-
ficial in a matching function and the }[OCRerr]ersine tries to eliminate this
dependency incorrectly and to too great a degree.
[OCRerr]ron-Kuhns:
The performance of this function is far below the three good func-
tions. There are two possible explanations. One is the problem of comple-
mentation, i.e. the complement of a weighted vector may be defined in a
better way. The second possible explanation is the importance of the non-
zero non-matching weights. In this study, only the matching weights were
complemented. It might be advisable to complement the zero weights in one
vector for those concepts with non-zero weights in the other vector. It
still does not seem advisable to complement all the zero weights for the
same reasons as stated previously.
Cverlap:
The performance of the Overlap coefficient in the ADI and Cran-
field varies drastically. The explanation may lie in the differences between
the subject content of the two collections. Since the weights of the request
are usually less than the weights of the document, the numerator is not
strongly influenced by a matching concept with a very large weight in a