ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-42
range of query-document correlations is possible even for the documents
in the category with the highest query-category vector correlation
observed. [OCRerr]hus in the comparison of recall values it would be fairer
to eliminate the low correlating documents from the category retrieved
s[OCRerr]bset be£cre [OCRerr]Qm'pari[OCRerr]Qn wi[OCRerr]h [OCRerr]he[OCRerr][OCRerr][OCRerr]11 [OCRerr].a[OCRerr]ch [OCRerr] Once these
comments are noted, however, it is felt that the evaluation parameters
described above are useful in judging the performance of the two level
search scheme.
A program was written to:produce the evaluation parameters, and
the resutts for a sample search request `1Core Memoryt1 are shown in
Figure 4-12. From part (a) of this figure, one can see that all the
relevant documents can be retrieved by searching only the first two
categories; thus 1OO%o recall results with a total of 69 comparisons:
20 for category matching and 49 for[OCRerr]document matching. Figure 4.15
sciows the evaluation parameters averaged over the set of 24 search
requests for each of the classifications. Even though the results are
not as good as for the single query shown, it is nevertheless clear
that for a relatively small cost (in terms of missing associated
associated documents) a large increase i[OCRerr] search efficiency can be
gained.
On the basis of the experimental evidence gained with this
small collection it can be concluded that:
1.) A metric query-document matching function enables an
automatic classificati9n of the type considered to be
easily produced.
2.) Such classification scheme s:are. likely `tobe more