ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-42 range of query-document correlations is possible even for the documents in the category with the highest query-category vector correlation observed. [OCRerr]hus in the comparison of recall values it would be fairer to eliminate the low correlating documents from the category retrieved s[OCRerr]bset be£cre [OCRerr]Qm'pari[OCRerr]Qn wi[OCRerr]h [OCRerr]he[OCRerr][OCRerr][OCRerr]11 [OCRerr].a[OCRerr]ch [OCRerr] Once these comments are noted, however, it is felt that the evaluation parameters described above are useful in judging the performance of the two level search scheme. A program was written to:produce the evaluation parameters, and the resutts for a sample search request `1Core Memoryt1 are shown in Figure 4-12. From part (a) of this figure, one can see that all the relevant documents can be retrieved by searching only the first two categories; thus 1OO%o recall results with a total of 69 comparisons: 20 for category matching and 49 for[OCRerr]document matching. Figure 4.15 sciows the evaluation parameters averaged over the set of 24 search requests for each of the classifications. Even though the results are not as good as for the single query shown, it is nevertheless clear that for a relatively small cost (in terms of missing associated associated documents) a large increase i[OCRerr] search efficiency can be gained. On the basis of the experimental evidence gained with this small collection it can be concluded that: 1.) A metric query-document matching function enables an automatic classificati9n of the type considered to be easily produced. 2.) Such classification scheme s:are. likely `tobe more