ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Introduction
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-4
In document retrieval, one does not necessarily desire to extract and
represent the `1content" 0£ a document, but rather to characterize that
content in a manner which can consistently lead to the recovery 0£ its
primary representation, namely the document (the representation 0£ the
natural language)
T[OCRerr]e £irst £unctional aspect 0£ a[OCRerr] document retrieval system,
there£4'[OCRerr]re, involves the means £or representing or characterizing the
in£ormation content 0£ source documents. [OCRerr]aditionally, this is the
process 0£ subject indexing. Use£ul re£erents to documents in a
retrieval system may be indicative 0£ attributes other than in£ormation
* content. In particular, re£erents such as the author's name,
*.publication date, journal or publisher identi£ication, cited
10
re£erences, etc., can be use£ul in several contexts. For the current
*purposes, however, those re£erents not[OCRerr]directly indicative 0£
in£ormation content'will be ignored with the understanding that their
* `practical use£ulness to the retrieval process as a whole nuist be
considered i[OCRerr] special circumstances. Chapter 2 0£ this. study considers
the role 0£ indexing in document retrieval systems. The index £unction
is discussed in terms 0£ its &oals, as well as in terms 0£ the
linguistic aspects 0£ its mechanization, and 0£ the possibilities 0£
optimization'6£ automatic indexing techniques.
The second £unctional aspect 0£ the retrieval system, [OCRerr]hat is
the search request £ormulation, is primarily a user £unction. In the
broad sense it is also a system ,£unction in that a retrieval system
includes the user. In a narrower sense, however, when the system is
I