ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-12 A second principal difference between manual and automatic infor- mation analysis systems is the relative difficulty in manual systems of discrimlnating among keywords by weights assigned to reflect their relative importance. This results in the "all or nothing" situation where a given identifier is either present or not, and each identifier is considered to be equally important. In an aut[OCRerr]natic system, on the other hand, it is easy to assign weights to individual identifiers, as shown in Fig. 2. These weights can be derived in part by using the frequency of occurrence of the original text words, and in part as a function of the various dictionary mapping procedures. Thus, ambiguous terms which in a synonym dictionary correspond to many different concept classes, can be weighted less than unambiguous terms. The relative usefulness of analyzing document sections of varying lengths, and of utilizing weighted terms is reflected in the output of Figs. [OCRerr] and 6. These recall-precision graphs exhibit output averaged over 17 search requests for the IRE - 2 collection and over 35 requests for the ADI material. Since it is in general desirable to get both high recall (that is, to retrieve most of what is relevant) and high precision (that is, to retrieve very little that is irrelevant), the region of importance is the upper right-hand corner of each graph. The more effective a given retrieval algorithm, the smaller will be the distance between the correspon- ding recall-precision curve and the 1:1 recall-precision point. Fig. 5(a) shows a comparison of a "title only" option, where only the titles of documents are used in the analysis [OCRerr]dth a "full abstract" option. In both cases, the word stems originally extracted from document titles and document abstracts were first looked-up in a synonym dictionary