ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Search Request Formulation
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
3-17
produce a modified query which is positioned centrally with respect
to the relevant documents while maintaining maximum distance from the
nonrelevant documents. [OCRerr]his is possible, however, only in so far as
the index images of the relevant set are differentiable from those of
the nonrelevant set.
In this context it is possible that the information needs of
a user might be best satisfied by a multiple rather than a single
search request. This would be the case, for example, if useful
references happened to be mappe.d&by..the.'[OCRerr]index transformation into
several distinct regions of the index space. Since the user in
general has no a priori means of determining whether he should use
a single or a multiple search (other than his own intuition,) it is
of interest to consider automatic means for generating multiple
searches. Assume, for example, that the relevant set ? identified by
a user after an initial retrieval operation contains document images
sufficiently seperatei so as to be considered only slightly related.
[OCRerr]igure 3.3 shows an example in two dimensions. Under the circum-
stances portrayed the relevance feedback adjustment algorithm is not
useful since, in fact, there is no single vector close to both
relevant document images. This suggests that useful information can
be derived by measuring the degree of association among the elements
of the relevant subset identified by the user. Such information is
contained in the document-document correlation matrix which character-
izes this subset.
Consider, for example, the situation described `by the