ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- `[OCRerr] `[OCRerr]here 5 .[OCRerr]e a';[OCRerr]ra[OCRerr][OCRerr] [OCRerr]`
C C
[OCRerr] [OCRerr] [OCRerr] [OCRerr]`-q[OCRerr][OCRerr]' [OCRerr] [OCRerr]-;[OCRerr][OCRerr]np'[OCRerr] qury. In a[OCRerr]di-[OCRerr]ion, an ex[OCRerr]ecte[OCRerr]
;1[OCRerr][OCRerr]I'te 1[OCRerr][OCRerr]z 0 [OCRerr]ievant ciocu::[OCRerr]nts ecual to [OCRerr]
n(JT[OCRerr]O [OCRerr] ))/n(DR) `dill acco::.[OCRerr]any the increase in search e££iciency.
t'r-;- relative co[OCRerr][OCRerr] 0£ a query-[OCRerr]oc'[OCRerr]en[OCRerr] comparison is c1, and the
elaI';[OCRerr] value 0£ retrievin-[OCRerr] a relevant document is [OCRerr] 2 the search
[OCRerr]r i[OCRerr]iput query 0£ the base system may be expressed as:
C5 = c1 [OCRerr] - c2WDR[OCRerr]R)
whereas with reduced searching, the search cost per query is:
- oN - C n(i [OCRerr]R )
5 ic 2 R C
That system 0£ storage organization which minimizes the expectation 0£
C over the population 0£ input queries may be de£ined as the optimal
5
reduced search strategy. Further, any reduced search strategy £or
which the expectation 0£ C is less than the expectation 0£ C , may
5 5
be used to provide a[OCRerr]net gain to the retrieval system user.
The above cost expressions are oversimpli£ications since the
relative costs are subjectively variable £rom user to user and since
the total costs are probably not linear £unctions as assumed. `BLit
this £ormulation provides basic insight into the potential gain
which may be realized £rom a classi£ication induced storing organization
in a document retrieval system.