ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-~6
6. Experimental Results
Th& classification algorithm described above establishes both a
set'of document categories and[OCRerr]a representation for each such category
in the form of its classification vector. Since these vectors are
suitable for direct comparison with search re[OCRerr]uests (i.e. they are
identical in form to document index images ), the implementation of a
two level search in the retrieval system is quite straightforward. Thus
a user's search request is first matched with the set of classification
vectors to determine which categories are most likely to contain
documents which will be sufficiently close to the query to be retrieved.
Depending on the correlation distribution of the query with the
classification vectors, the documents in one or more categories can be
retriev[OCRerr]ed and individually correlated with the search request to
produce the final retrieval output.
Assume that the `retrieved output for a query q produced in the
full search mode is the document subset R where:
R [OCRerr] : $q,d[OCRerr][OCRerr])< &r
Let a search overthe classif[OCRerr]cation categories produce a set of query-
classification vector distances:
g(q-,-c.) =c(.,. j = l,nc;
For processing purposes, an integer version of the normalized classi-
fication[OCRerr]vector c/i'd of equation (4.2) is produced by scaling and
truncation.