ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Relevance Feedback in an Information Retrieval System
chapter
W. Riddle
T. Horwitz
R. Dietz
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI-2
the documents to a userts need.
Attempts have been made to improve the results obtained in a single
search through a document collection by improving the query before a search
is made, or by using a correlation function which better reflects
relevance. Improvement of a query by an expansion done by the system
prior to a search of the document set has been suggested. [14] This
expansion is done either on the basis of statistically determined concept
relations, or on the basis of a concept h[OCRerr]erarchy, and causes concepts
to be added to the query vector if they do not originally appear but are
statistically correlated with or hierarchically related to concepts which
do appear. It has also been suggested that the user himself reformulate
his query prior to the search, and tests using the SM[OCRerr] System [3]
indicate that improved results, in terms of the ni[OCRerr][OCRerr]ber of relevant docu-
ments retrieved, are obtained by this method. The reformulation is done
before the query is processed, on the basis of a statistical analysis of
the document set with respect to the index terms present in the original
query. The improvement is effected by the elimination of those terms
which have a high frequency in the document set (and are therefore not
adequate differentiators), and reinforcement of those terms appearing
infrequently in the document set (i.e. good differentiators). Maron and
Kuhns have suggested a correlation technique using relevance numbers.
These numbers are determined by probabilistic indexing, a method in which
the indexer assigns a numerical value indicating the probabilistic value
of that term to the document being indexed. [1] These methods, however,
are not entirely adequate, since either they depend on a priori determina-
tion of relevance relationships which may not apply to the entire user