ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - `[OCRerr] `[OCRerr]here 5 .[OCRerr]e a';[OCRerr]ra[OCRerr][OCRerr] [OCRerr]` C C [OCRerr] [OCRerr] [OCRerr] [OCRerr]`-q[OCRerr][OCRerr]' [OCRerr] [OCRerr]-;[OCRerr][OCRerr]np'[OCRerr] qury. In a[OCRerr]di-[OCRerr]ion, an ex[OCRerr]ecte[OCRerr] ;1[OCRerr][OCRerr]I'te 1[OCRerr][OCRerr]z 0 [OCRerr]ievant ciocu::[OCRerr]nts ecual to [OCRerr] n(JT[OCRerr]O [OCRerr] ))/n(DR) `dill acco::.[OCRerr]any the increase in search e££iciency. t'r-;- relative co[OCRerr][OCRerr] 0£ a query-[OCRerr]oc'[OCRerr]en[OCRerr] comparison is c1, and the elaI';[OCRerr] value 0£ retrievin-[OCRerr] a relevant document is [OCRerr] 2 the search [OCRerr]r i[OCRerr]iput query 0£ the base system may be expressed as: C5 = c1 [OCRerr] - c2WDR[OCRerr]R) whereas with reduced searching, the search cost per query is: - oN - C n(i [OCRerr]R ) 5 ic 2 R C That system 0£ storage organization which minimizes the expectation 0£ C over the population 0£ input queries may be de£ined as the optimal 5 reduced search strategy. Further, any reduced search strategy £or which the expectation 0£ C is less than the expectation 0£ C , may 5 5 be used to provide a[OCRerr]net gain to the retrieval system user. The above cost expressions are oversimpli£ications since the relative costs are subjectively variable £rom user to user and since the total costs are probably not linear £unctions as assumed. `BLit this £ormulation provides basic insight into the potential gain which may be realized £rom a classi£ication induced storing organization in a document retrieval system.