ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Relevance Feedback in an Information Retrieval System chapter W. Riddle T. Horwitz R. Dietz Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-2 the documents to a userts need. Attempts have been made to improve the results obtained in a single search through a document collection by improving the query before a search is made, or by using a correlation function which better reflects relevance. Improvement of a query by an expansion done by the system prior to a search of the document set has been suggested. [14] This expansion is done either on the basis of statistically determined concept relations, or on the basis of a concept h[OCRerr]erarchy, and causes concepts to be added to the query vector if they do not originally appear but are statistically correlated with or hierarchically related to concepts which do appear. It has also been suggested that the user himself reformulate his query prior to the search, and tests using the SM[OCRerr] System [3] indicate that improved results, in terms of the ni[OCRerr][OCRerr]ber of relevant docu- ments retrieved, are obtained by this method. The reformulation is done before the query is processed, on the basis of a statistical analysis of the document set with respect to the index terms present in the original query. The improvement is effected by the elimination of those terms which have a high frequency in the document set (and are therefore not adequate differentiators), and reinforcement of those terms appearing infrequently in the document set (i.e. good differentiators). Maron and Kuhns have suggested a correlation technique using relevance numbers. These numbers are determined by probabilistic indexing, a method in which the indexer assigns a numerical value indicating the probabilistic value of that term to the document being indexed. [1] These methods, however, are not entirely adequate, since either they depend on a priori determina- tion of relevance relationships which may not apply to the entire user