Scientific Report No. ISR-11 Information Storage and Retrieval

ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Summary summary Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Summary The present report is the eleventh in a series covering research in automatic storage and retrieval conducted initially at the Computation Laboratory of Harvard University, and more recently jointly undertaken by Harvard and by the Department of Computer Science of Cornell [OCRerr]iversity. From the outset, the design of automatic information Systems was of principal concern, and the research dealt specifically with the evaluation of a variety of fally automatic methods for information analysis and search. This work resulted in the design of an experimental, fully automatic document retrieval system, called SM[OCRerr]T, operating on an IBM 7O9[OCRerr] computer, and described in detail in two previous reports in this series, nunibered ISR-7 dated June i96l[OCRerr], and ISR-9 dated August 1965. The SM[OCRerr]T system is characterized by the fact that documents and search requests are handled in the natural language without any prior manual analysis, and are processed by one of many different content analysis procedures incorporated into the system. Among these are various statistical and syntactic language analysis methods, and table look-up routines based on a variety of dictionaries and thesauruses. The dictionaries are normally constructed not by c[OCRerr]imiittees of subject experts, but semi-automatically starting with representative document collections for each subject area. Since it is unreasonable to expect that the documents retrieved by a single search of the collection should provide adeqLlate answers to all users in all circumstances, iterative search procedures have been used in conjunction xiii