ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. I I 4-41 d[OCRerr]tail[OCRerr]for' each query the following parameters could be produced: 1.) The total number of documents in the union of the retrieved categories 2.) The overlap correlation of the category retrieved subset with the first 15 and first 30 documents retrieved by a full'search; (The[OCRerr]over'lap correlation between sets A an'd B is `defined by `n(A(\B)/minimum(n(A),n(B)).) 3.) The category recall or percentage of relevant documents in the' category retrieved subset to the total number of relevant documents. 4'.) The normal recall or percentage of relevant documents retrieved t6 thb total number of relevant documents, assufliing the `same total number of documents retrieved as contained in the category retrieved subset. It should be' no'te'd that this method of evaluating the classific'ation based' sear'cb is somewhat unfair'on two cQunts. First, it does not consider `the correlation distribution of the search requests with the category v"ectors. Thus when a query has high correlation with' only'one or two category vectors, only these should be searched. Some queries, however, will not correlate very well with any of the category vectors; and:,in this case, one should expect to hav'e to `search a' larger number of categories in detail to do as well as a full search. `Queries [OCRerr]f this latter' type in effect do not fit the' classification s'truc'tur'e: Second; the degree of association betweeneach classifica"t[OCRerr]on' vector and the documents it represents (as reflected by Figure 4.10 is sufficiently small such that a wide