ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-[OCRerr]7
where n is the number of categories. Assume for simplicity that each
C
category subset C. contains only documents for which:
J(c. d.)<
.1[OCRerr] 1 -
0
where C. is the representation for C.. The distance from the query q
to any member of the set C. may be bounded by:
max L O,([OCRerr].-& [OCRerr]<L [OCRerr] + 0
[OCRerr] O[OCRerr] 1
On a probabilistic basis, then, the category for which is minimum
is clearly most likely to contain documents close enough to the query
to satisfy the retrieval criterion. Thus the ordering of categories
by increasing query-classification vector distance dictates the
sequence in which individual query-document comparisons should be
made.
To test the characteristics of this system of query-document
searching, the[OCRerr]classifi'c'a'tion algorithm was programmed in Fortran and
run on the IBM 7094 to[OCRerr]produce several classifications of the document
set of 405 IRE abstracts dis6ussed earlier. Retrieval results based
on a fuir search of this collection for a set of 24 sample search
requests were available from[OCRerr]previous experiments conducted with the
S[OCRerr]ART system. The objective, then, is to compare the retrieval
characteristics resulting from the classification induced search
system to those obtained in the full search mode. Equation (4.[OCRerr])