ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-30 F) Manual Indexing The Cranfield collections were available for purposes of experimenta- tion both in the form of abstracts and in the form of manually assigned index terms. The indexing performed by subject experts is extremely detailed,consisting for some documents of over fifty index terms. As such, the indexing performance may be expected to be superior to the subject indexing normally used for large document collections. Neverthe- less the output of Fig. 14(a) shows that the retrieval results obtained by matching the index terms (11index nullt) is only slightly superior to the standard word stem matching procedure, using the words extracted from the document abstracts. When the manual indexing procedure is compared with the word stem association process, it is seen in Fig. 14(b) that the word stem match with the associated terms is superior to the index term method. The same is true when manual indexing is compared with the regular thesaurus process. The output produced with the Cranfield collection then leads to the following rule: Rule 8 : Keyword matching systems based on manually assigned index terms are found (at least for one well-known document collection) to be not substantially superior to raw word matching techniques, and to be actually inferior to statistical word association and to thesaurus methods. This rule is in complete contradiction to what one hears repeated over and over again by documentation and library science specialists. Moreover, as the collection sizes increase, the manual indexing procedure