IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-25
document. This insensitivity may explain the poor [OCRerr]rformance of this func-
tion compared to some of the other coefficients.
Farker -Rhodes -Needham:
The strikin[OCRerr] difference in performance of this function in the
ADI collection, where it proved very powerful, and in the Cranfield collec-
tion where it performed rather poorly is puzzling. Further evaluation with
other document collections is needed before any conclusions as to its value
can be made.
Stiles:
This coefficient shows a consistent high performance for both the
ADI and Cranfield collections. It is far less sensitive to variations in
collection characteristics than the Overlap and the Parker-Rhodes-Needham
coefficients. The explanation of this phenomenon is difficult due to the
complexity of the formula; however, its quasi-binary character seems to
give reasonable results. One possible refinement may be a better definition
of N
Reitsma-Sagalyn:
Three different modifications of this formula were used in this
study. In one of them N equals the number of concepts in either the query
vector or the document vector (the maximum of the two). Another form results
in using the number of matching concepts for N . When this is done, it is
observed that many relevant documents occur at the end of the ranked list.
This leads to the third modification in which the second form was used but
were ranked in the reverse order. In general, this formula proved
the documents
ineffective.