ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Introduction
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-3
a) the representation 0£ the information content 0£ source
documents, i.e. the indexing function;
b) the representation 0£ the information needs 0£ the users
0£ the system, i.e. the search request forunilation
function;
c) the matching operation between search request
representations and source document representations, i.e.
the search or retrieval functio[OCRerr].
in addition to this functional characterization, other elements 0£
document retrieval system organization are important in an: operational
framework. Such characteristics as storage organization, input-output
facilities, document acquisition policy, economic fact6rs and others
may be critical in an operational sense9, but for £he purposes 0£ this
report these will be considered primarily as secondary factors. In
this sense, then, the main purpose here is to consider the logical and
methodological aspects of the mechanization of document retrieval
systems, and in so doing to i[OCRerr]ore ma[OCRerr] 0£ the operational factors
which may be important in other contexts.
The true information content of a document or se[OCRerr]ent of
natural langnage text might be defined as existing only in the mind of
its author. The representation of this content in recorded form via
the natural langua[OCRerr]e can be considered as an attempt at communication.
That in fact such co'nm'1nication is successful on the average might in
part be measured in terms of human progress. In any case, the
information content of a document is a theoretically tenuous concept.