ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Relevance Feedback in an Information Retrieval System
chapter
W. Riddle
T. Horwitz
R. Dietz
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v'-6
Co-occurrence correlation function: [6]
m
[OCRerr] )
m m
( [OCRerr] d[OCRerr]d[OCRerr] ) - q.d.
i=l 1=1 1 1
*
Si!r[OCRerr]le vector matching correlation function:
m
( [OCRerr] [OCRerr] )
i=l
m
where m = the number of indexing concepts
the ith concept weight of the query vector
d.= the ith concept weight of the document vector
1
The effect of these different correlation functions on the relevance
feedback process is not known, so the correlation function is included as
another parameter in this investigation.
a) Determination of the Relevance Weighting Factors
In determining the relevance weighting factors the assumption is made
that no information concerning the relative ranking of the relevant docu-
ments is available. That is, there is no way of knowing if one relevant
document is more r[OCRerr]evant than another. This is consistent with the
proposed information retrieval system, in which the user returns only
11relevant1T or `1non-relevant" judg[OCRerr]ients, without indicating the degree of
relevance of each document retrieved. This implies that the numerical
interpretation of the relevance information should be binary; therefore a
weight of 1 is used as the relevance weighting factor of a relevant document
*
The sin[OCRerr]le vector matching correlation function, as stated, is strictly
suitable for use only with binary document and query vectors. Its use
with other than binary vectors, without the addition of a normalization
factor, does, however, preserve the relative rankings of the documents.