ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-8
the matching £unction used to implement selection or ranking, a
retrieval operation is uniquely speci£ied. Each input query is matched
with every re£erence document to speci£y the retrieved output. Matching
a user1s search request against the £ull store 0£ document index
images exploits, in e££ect, the maximum capabilities 0£ the system.
For any but limited collections however, the complexity 0£ e££ective
matching operations make a £ull search impractical. Use£ui retrieval
systems are then required to impose some organization to the document
store so as to limit the scope [OCRerr]£ the search to a document subset 0£
manageable size.
The necessity £or storage orgaYli[OCRerr]ation in £act is likely to
become more stringent as research on automatic document retrieval
progresses. Advances in the techniques 0£ automatic content analysis
are likely to lead to more complex index representations capable 0£
carrying more in£ormation. Such index representations, while allowing
£or £iner retrieval distinctions necessarily require more time £or
each basic comparison operation. In addition, the introduction 0£
operationally e££ective time-shared computer systems is likely to
produce signi£icant changes in the organization 0£ document retrieval
systems. In a real time environment the response time 0£ the system to
the user's demand plays a critical role on overall system per£ormance.
As the time per query-document comparison increase[OCRerr] due to increased
in£ormation in the index representations, the number 0£ comparisons
possible per unit time decreases. Thus, `even with the increasing speed
0£ in£ormation processing equipment, these £actors suggest that some