ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Search Request Formulation
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
[OCRerr]-25
an explanation of how the concept weights are derived.) [OCRerr]he first
part of the algorithm implements equation [OCRerr] directly and results
in a new query image as shown in part (d) of FigLire [OCRerr].5. [OCRerr]he appli-
cation of a screening process to this vector results in a final
modified query image as shown in part (e). [OCRerr]he screening process is
designed to eliminate any. negative components in the modified query
image, as well as to reduce the positive nonzero components to those
most likely to be useful. *[OCRerr]his latter feature is incorporated since
the statistical evidence implicit in the user!s relevance judgments
may represent a relatively small ssmple.
[OCRerr]oncepts which are retained after screening either a) occur
in the original query, or b) occur in at least half of the relevant
documents identified in addition to being more frequent in the
relevant set than in the nonrelevant set. [OCRerr]he screening algorithm
thus serves to prevent the modified query from becoming too special-
ized to those relevant documents identified in the initial retrieval
operation. In addition, reducing the number of nonzero components in
the[OCRerr]modified query image[OCRerr]provides increased efficiency. With fewer
components the modified search request requires less storage space and
can be correlated with reference document images with fewer operations.
[OCRerr]egative components in a modified query image represent
properties in the index space which are more significant among the
nonrelevant documents retrieved by the user's original search request,
than among the retrieved relevant documents. In principle then, there
are no conceptual difficulties in allowing such negative weights. [OCRerr]his
is in contrast to generating property vector index representations of