ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
form of storage organization or document classification will be
necessary to achieve economic retrieval from large collections with
response times fast enough for a real time environment.
Classification may be regarded as a part of the general
problem of content analysis When a document is classified under
some given subject heading, its information content has been found to
be related to that area of discourse. A 9lassification system,
however, is rarely used for retrieval in the sense that a user can be
satisfied by all the references assigned to some given category. The
classification schedule in general provides a means of storage
organization which allows a user to limit the scope of his search. In
this sense the process of document classification is analagous to the
document indexing process. The index image of a document characterizes
the information content of that document while a classification
category normally characterizes the information content of some area
of di[OCRerr]course in the general field of knowledge. The assignment of
some set of documents to a categorythen,.in effect, creates an index
image for the information content of the entire set. The user
matches his information needs against the categories of the
classification system to select subsets of documents in the same wa[OCRerr]½
in which his search request is matched with individual document
representations to select particular references. Tbua in automatic
document retrieval systems, as in conventional library systems,
document classification provides the key for a storage organization
which can effectively limit the number of references which must be
examined in detail in a given retrieval operation.