ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-~6 6. Experimental Results Th& classification algorithm described above establishes both a set'of document categories and[OCRerr]a representation for each such category in the form of its classification vector. Since these vectors are suitable for direct comparison with search re[OCRerr]uests (i.e. they are identical in form to document index images ), the implementation of a two level search in the retrieval system is quite straightforward. Thus a user's search request is first matched with the set of classification vectors to determine which categories are most likely to contain documents which will be sufficiently close to the query to be retrieved. Depending on the correlation distribution of the query with the classification vectors, the documents in one or more categories can be retriev[OCRerr]ed and individually correlated with the search request to produce the final retrieval output. Assume that the `retrieved output for a query q produced in the full search mode is the document subset R where: R [OCRerr] : $q,d[OCRerr][OCRerr])< &r Let a search overthe classif[OCRerr]cation categories produce a set of query- classification vector distances: g(q-,-c.) =c(.,. j = l,nc; For processing purposes, an integer version of the normalized classi- fication[OCRerr]vector c/i'd of equation (4.2) is produced by scaling and truncation.