ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Search Request Formulation
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
correlation matrix 0 H1[OCRerr]i[OCRerr]C [OCRerr].4. i[OCRerr] is assume[OCRerr]
t[OCRerr][OCRerr]t th- [OCRerr].- =) [OCRerr] r r has been identified as a relevant
[OCRerr] 4' 5[OCRerr]
[OCRerr] in resŁonse to some initial search request, q0. These document
L[OCRerr]a[OCRerr]es are correlated against each other producing the correlation
matrix sho[OCRerr][OCRerr]. If this matrix is used as a basis for partitioning the
by clustering technique, [OCRerr] subsets iLl = r[OCRerr]r and
I[OCRerr] some r1 `2'5
2
R = r[OCRerr],r4 will result In this case then, the system can generate
two new queries by using each of these subsets together with the non-
relevant set S. Thus the following pair of new search requests can be
formed:
-1
q1 = n1n2q0 t n2 7 r[OCRerr] - n1 7 s[OCRerr]
1 1.
and
(5.11)
-2 - +-. n2 7.; [OCRerr] - nt
q1 = n1 n2q0 r[OCRerr].E[OCRerr]2 i .
1
(5.12)
7 [OCRerr]
s.~S
1
where n1 = n(R1), n11' = n([OCRerr]2), and n2 = n(S).
On the basis of a partition of the relevant subset identified
by the user, two new search requests have been formed from a single
original.request. Roughly, this procedure amounts to allowing the
user to. identify particular documents in the collection and request
additional references "like1' those he has singled out. Ry examining
the degre'e of association among the `identified documents, it is possi-
ble to "determine if" thi's can be done' `efficiently with a single search
request or whether multiple searching is required.