ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Summary
summary
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Summary
The present report is the eleventh in a series covering research
in automatic storage and retrieval conducted initially at the Computation
Laboratory of Harvard University, and more recently jointly undertaken
by Harvard and by the Department of Computer Science of Cornell [OCRerr]iversity.
From the outset, the design of automatic information Systems was of
principal concern, and the research dealt specifically with the evaluation
of a variety of fally automatic methods for information analysis and search.
This work resulted in the design of an experimental, fully automatic
document retrieval system, called SM[OCRerr]T, operating on an IBM 7O9[OCRerr] computer,
and described in detail in two previous reports in this series, nunibered
ISR-7 dated June i96l[OCRerr], and ISR-9 dated August 1965.
The SM[OCRerr]T system is characterized by the fact that documents and search
requests are handled in the natural language without any prior manual
analysis, and are processed by one of many different content analysis
procedures incorporated into the system. Among these are various statistical
and syntactic language analysis methods, and table look-up routines based
on a variety of dictionaries and thesauruses. The dictionaries are normally
constructed not by c[OCRerr]imiittees of subject experts, but semi-automatically
starting with representative document collections for each subject area.
Since it is unreasonable to expect that the documents retrieved by a single
search of the collection should provide adeqLlate answers to all users in
all circumstances, iterative search procedures have been used in conjunction
xiii