ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-[OCRerr]7 where n is the number of categories. Assume for simplicity that each C category subset C. contains only documents for which: J(c. d.)< .1[OCRerr] 1 - 0 where C. is the representation for C.. The distance from the query q to any member of the set C. may be bounded by: max L O,([OCRerr].-& [OCRerr]<L [OCRerr] + 0 [OCRerr] O[OCRerr] 1 On a probabilistic basis, then, the category for which is minimum is clearly most likely to contain documents close enough to the query to satisfy the retrieval criterion. Thus the ordering of categories by increasing query-classification vector distance dictates the sequence in which individual query-document comparisons should be made. To test the characteristics of this system of query-document searching, the[OCRerr]classifi'c'a'tion algorithm was programmed in Fortran and run on the IBM 7094 to[OCRerr]produce several classifications of the document set of 405 IRE abstracts dis6ussed earlier. Retrieval results based on a fuir search of this collection for a set of 24 sample search requests were available from[OCRerr]previous experiments conducted with the S[OCRerr]ART system. The objective, then, is to compare the retrieval characteristics resulting from the classification induced search system to those obtained in the full search mode. Equation (4.[OCRerr])