ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Search Request Formulation chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. oi the rei[OCRerr]erence collection. [OCRerr]he dejinition o[OCRerr] the optimal re[OCRerr]uest reilects the partition in terms ol the statistical properties of the correlations of the query and source document index images. Let q represent the index image of a search request and d. 1 index image of a reference document (d[OCRerr] = T(D[OCRerr]), [OCRerr] :n mathematical terms, the optimal request vector q corresponding to a 0 subset D of D is defined as that vector q whidh maximizes: C = [OCRerr] - [OCRerr] (3.2) E [OCRerr] D[OCRerr] the where n0= n(D[OCRerr]) the number ofelements in D[OCRerr], and m n(i) the total number of elements in'the reference collection. Substituting for[OCRerr](q[OCRerr][OCRerr]d[OCRerr]) and using vector notation results in: C=[OCRerr]hO [OCRerr] [OCRerr]Id7diiI - mlno [OCRerr] [OCRerr].q[OCRerr]d;[OCRerr]ii' (3.3) d [OCRerr]D i [OCRerr] or: C=t[OCRerr]I L-'[OCRerr]I *[OCRerr]S[OCRerr]W½Z Fdk'S] (3.4) From this last equation it is clear *that C is just the dot product of a unit vector along the direction[OCRerr]of q[OCRerr] and a vector which