ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Search Request Formulation
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
3. [OCRerr]equest Optimization
To define an optimal search request, it is necessary to
start with an explicit formulation of the model wh[OCRerr]h specifies the
retrieval system. In particular it will be shown that a reasonable
definition of request optimality is directly related to the retrieval
or query-document matchirig function. In the model outlined in Chapter
1, it was assumed that the matching criterion for selecting reference
documents in response to input queries is the magnitude of the
angnlar distance between the query vector image and the vector images
of[OCRerr]the';.source[OCRerr]documen'ts. It is now convenient to introduce a query-
document correlation function which is a monotonic function of
angular distance in the vector space (over the range J9< 1800>.
Assuming, therefore, thatthe output of a retrieval operation is a
partial ordering of all source document representations in the
collection, D, derived on the basis of the angular distance from the
input query image, the c6sine correlation function,
$ (a,b) = a[OCRerr]b = 005& - (3.1)
Ia lb a,b
can be used to induce the same ordering. [OCRerr]te that the correlation
function is inverse to the angular distance in that [OCRerr] = C[OCRerr] maps into
1, and l[OCRerr]I =180 intoj[OCRerr]= -1. Thus the range [OCRerr] 191 <18O[OCRerr]
maps into the range[OCRerr]1 <+1, so that increasing angular separation
corresponds to decreasing correlation. It may also be noted that the
restriction of the vector images to nonnegative components (as is the