ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Search Request Formulation
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
L[OCRerr] riranzactions on Electronic Computers (iMarch - September, i[OCRerr][OCRerr]s) was
a[OCRerr]il&ole for this purpose.[OCRerr]'4 Both the reference documents and each
of the search requests which had been submitted at Harvard in the
natural language were indexed using the S[OCRerr]ABT thesaurus. As the search
requests had been used in a variety of previous retrieval experiments
with this collection, relevance judgments for each query were also
available, representing a full manual search through the complete
reference collection.
A full retrieval ordering of the source documents with respect
to each sample query was available, consisting of the correlation of
each search request index image with every reference document image.
[OCRerr]rom the initial portion of the retrieved list (ordered by descending
correlations), two sets of documents were specified: one containing
relevant documents and one containing nonrelevant documents. The vector
index images of each search request, and the images 0£ the documents in
the two associated subsets were used as inputs to a [OCRerr]ortran program
written to implement the query modification process. The output 0£
this program was a new query vector suitable for input to the SMART
system. The modified query images could then be correlated with the
reference collection and the results compared with those of the original
search requests.
Table 3.1 describes the program steps used to implement the
relevance feedback query modification algorithm. [OCRerr]igure 3.5(a) shows
the English text of a typical query. Figure 3.5(b) shows the explicit
thesaurus mapping for the terms included in this query and part (c)
shows the index image 0£ the query in vector form (see Appendix A fpr