ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-is (called `1Harris 3:1 in Fig. 5), to provide vocabulaz[OCRerr]r normalization before the actual word matching operation. The curve of Fig. 5(a) makes clear how superior the full abstract process is compared with the title procedure. If the text words had been matched directly, without a thesaurus intermediary, the discrepancy between the two procedures would be even larger. The output of Fig. 5(b) shows that a further improvement is obtainable if full text is used, rather than oniy abstracts, particularly for the high recall region. However, the improvement is much smaller here, and in actual practice it would seem that the additional problems arising from a full text process can be avoided by restricting the procedure to abstracts and summaries, unless a clear requirement exists for a high recall performance. The output of Fig. 5 then leads to the following nile: Rule 1 : The use of document titles alone for purposes of information analysis results in poor retrieval performance compared with the use of abstracts or full text. Rule 1 is of particular interest because of the widespread advocacy of permuted title indexes (also known as KWIC indexes) for information search and retrieval purposes. Fig. 6 shows the improvement obtainable by using weighted word stems, compared with unweighted stems. It is clear from the figure that term weights are essential for retrieval purposes, and it can be inferred that one of the main drawbacks of presently operating keyword search systems is the lack of discrimination between terms of varying importance. Rule 2 can then be stated as follows: