ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Relevance Feedback in an Information Retrieval System
chapter
W. Riddle
T. Horwitz
R. Dietz
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI-12
a, the query q' is obtained and all the relevant documents are retrieved.
The final values of recall and precision depend on the number of
relevant documents retrieved on the successive searches, since more
information will obviously perturb the query to a greater extent. In
particular, there is a dependence on the nun[OCRerr]ber of relevant documents
retrieved initially, which is, in turn, dependent on the correlation
function used. (in this investigation, the dependence is actually on the
denominator of the correlation formula, since all of the functions tested
possess the same numerator.) If only a few of the relevant documents are
retrieved initially, then convergence is slow. In other words, given a
query having three relevant documents, the probability of retrieving all
three is higher if two of the documents are retrieved initially rather
than only one. As shown in Figure ii[OCRerr], for query [OCRerr]Al5 the cosine correla-
tion function initially retrieves three relevant documents, while the co-
occurrence and simple vector matching correlation functions retrieve two
and four respectively. Since the simple vector matching case now includes
more information concerning the concepts in the relevant documents, the
final values of recall and precision achieved by the modification process
are higher when simple vector matching is used as the correlation function,
than when either of the other two functions is used. These results suggest
that it is unwise to restrict the proposed retrieval system to the use of
a single correlation function.
1+. Conclusions
The implicit assumption underlying this investigation is that relevance
feedback is a necessary part of the overall retrieval process. As the