ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Search Request Formulation chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 3. [OCRerr]equest Optimization To define an optimal search request, it is necessary to start with an explicit formulation of the model wh[OCRerr]h specifies the retrieval system. In particular it will be shown that a reasonable definition of request optimality is directly related to the retrieval or query-document matchirig function. In the model outlined in Chapter 1, it was assumed that the matching criterion for selecting reference documents in response to input queries is the magnitude of the angnlar distance between the query vector image and the vector images of[OCRerr]the';.source[OCRerr]documen'ts. It is now convenient to introduce a query- document correlation function which is a monotonic function of angular distance in the vector space (over the range J9< 1800>. Assuming, therefore, thatthe output of a retrieval operation is a partial ordering of all source document representations in the collection, D, derived on the basis of the angular distance from the input query image, the c6sine correlation function, $ (a,b) = a[OCRerr]b = 005& - (3.1) Ia lb a,b can be used to induce the same ordering. [OCRerr]te that the correlation function is inverse to the angular distance in that [OCRerr] = C[OCRerr] maps into 1, and l[OCRerr]I =180 intoj[OCRerr]= -1. Thus the range [OCRerr] 191 <18O[OCRerr] maps into the range[OCRerr]1 <+1, so that increasing angular separation corresponds to decreasing correlation. It may also be noted that the restriction of the vector images to nonnegative components (as is the