ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-21
satisfy some comparison criterion may be determined. Assume further
that the elements can be grouped such that a comparison of the input
item with the representation of each group will determine whether any
of the group members' can satisfy the comparison condition. Under these
circumstances, assuming also that k groups are searched in detail, and
that the groups ar&of equal size,¼¼[OCRerr]eol number of comparisons
required, [OCRerr] may be written:
N `N (4.3)
t
where x is the number of categories and (N/x) is the population of
each category. Assuming that all elements of the collection have
equal a priori probability of matching an input item, the total
number of comparisons N is minimized for x = (k[OCRerr])[OCRerr]2[OCRerr]. Thus in an
t
ideal two level storage organization scheme 2(kN)21 comparisons are
required versus N for single level searching. The variation of Nt
`with the number of categ'ories (for k=1) is shown in Figure 4.3 on a
semi-log plot. Note that the minimum of the total number of
comparisons with number of categories in the classification is
relatively broad.
The' fo're' going analysis is applicable to the document
retrieval case `since all' documents can be considered to have equal
likelihoo'd" of' being relevant to an arbitrary search request (at least
in the absen'c'e ofany'e#idence tb the contrary). The classification'
categdries" should',' therefore, contain approximately the same number of
document images. To this end the criteria for `identifying a suitable