ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Introduction
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4
1-2
exists in the £Orm 0£ a collection 0£ documents (where a document
connotates any se[OCRerr]ment 0£ natural language text); a population 0£ users
exists with reason to believe that the collection may contain in£ormation
pertinent to its needs. The problem, there£ore, is in determining i£
in £act there are relevant documents (i.e. documents with in£ormation
content use£ul to users) in the collection and in obtaining those which
may be £ound. It will be'assumed that the determination 0£ the
existence 0£ relevant documents implies their identi£ication and that
some unspeci£ied means is[OCRerr]available £or obtaining tokens 0£ such
documents once identi£ied. In this context it should be noted that the
document retrieval problem is considered distinct £rom the
8
providing or £act retrieval[OCRerr]problem. The in£ormation content 0£ a
document in the £ormer is considered an atomic element 0£ the system;
and as such, a document or a set 0£ documents (or unique re£erents
there to) is provided in response to user demands. Bilt in the data-
providing or £act retrieval problem, *speci£ic items 0£ in£ormation,
e.g. £acts, messages, statements, answers to questions, etc., are
extracted £rom source material and provided in response to users'
queries. Automatic data-providing systems raise a class 0£ problems
*such as the mechani[OCRerr]zation' 0£ deductive and inductive inference which
are not considered here.
2. A [OCRerr][OCRerr]nctional [OCRerr]odel
Any document retrieval system, automatic or manual, can be
£unctionally characterized by three basic elements: