ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Search Request Formulation
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
oi the rei[OCRerr]erence collection. [OCRerr]he dejinition o[OCRerr] the optimal re[OCRerr]uest
reilects the partition in terms ol the statistical properties of the
correlations of the query and source document index images.
Let q represent the index image of a search request and d.
1
index image of a reference document (d[OCRerr] = T(D[OCRerr]), [OCRerr] :n
mathematical terms, the optimal request vector q corresponding to a
0
subset D of D is defined as that vector q whidh maximizes:
C = [OCRerr] - [OCRerr] (3.2)
E [OCRerr] D[OCRerr]
the
where n0= n(D[OCRerr]) the number ofelements in D[OCRerr], and m n(i) the total
number of elements in'the reference collection.
Substituting for[OCRerr](q[OCRerr][OCRerr]d[OCRerr]) and using vector notation results in:
C=[OCRerr]hO [OCRerr] [OCRerr]Id7diiI - mlno [OCRerr] [OCRerr].q[OCRerr]d;[OCRerr]ii' (3.3)
d [OCRerr]D
i [OCRerr]
or:
C=t[OCRerr]I L-'[OCRerr]I *[OCRerr]S[OCRerr]W½Z Fdk'S] (3.4)
From this last equation it is clear *that C is just the dot
product of a unit vector along the direction[OCRerr]of q[OCRerr] and a vector which