ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-21 satisfy some comparison criterion may be determined. Assume further that the elements can be grouped such that a comparison of the input item with the representation of each group will determine whether any of the group members' can satisfy the comparison condition. Under these circumstances, assuming also that k groups are searched in detail, and that the groups ar&of equal size,¼¼[OCRerr]eol number of comparisons required, [OCRerr] may be written: N `N (4.3) t where x is the number of categories and (N/x) is the population of each category. Assuming that all elements of the collection have equal a priori probability of matching an input item, the total number of comparisons N is minimized for x = (k[OCRerr])[OCRerr]2[OCRerr]. Thus in an t ideal two level storage organization scheme 2(kN)21 comparisons are required versus N for single level searching. The variation of Nt `with the number of categ'ories (for k=1) is shown in Figure 4.3 on a semi-log plot. Note that the minimum of the total number of comparisons with number of categories in the classification is relatively broad. The' fo're' going analysis is applicable to the document retrieval case `since all' documents can be considered to have equal likelihoo'd" of' being relevant to an arbitrary search request (at least in the absen'c'e ofany'e#idence tb the contrary). The classification' categdries" should',' therefore, contain approximately the same number of document images. To this end the criteria for `identifying a suitable