IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Correlation Measures
chapter
K. Reitsma
J. Sagalyn
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-2
too is analyzed in the same manner as the documents and is represented by
a description vector of concept numbers and corresponding weights.
Within the system, request-document comparisons are made using a
mathematical correlation coefficient. Each document is compared with the
request by calculating the magnitude of the coefficient. The documents are
then ranked according to the coefficient and hopefully according to the
degree of relevancy with the request.
The subject of this study is to evaluate several
to determine which one is the [OCRerr]wbestvl to use. The
the one for which the largeBt number of relevant
the top of the ranked document list.
Some work has been done with various correlation coefficient:.
In 1966, Manning and Hall analyzed several correlation coefficients and
proposed two of their own, however, they did. not present any conclusive
evidence for an evaluation. It is the aim of this study to evaluate several
of the coefficients previously used by Manning and Hall, including one which
they proposed, as well as a few others which have been derived from other
types of coefficients.
The initial evaluation has been done on the ADI collection of
82 documents and 35 requests. Since this collection is small, any con-
clusion must be verified on a larger collection, such as the 200 document
Cranfield collection.
ficients
should be
found on
correlation coef-
Ilbest?? coefficient
documents are
2. Weighted versus [OCRerr]gical Description Vectors
A document description vector can take on two forms. One is a
logical or binary vector in which every element is either 0 or 1 . Each
position in the vector represents a concept (e.g. the first position represents