ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-li 3. Ehraluation Results and Design Criteria In attempting to generate useful criteria for the design of information systems, a number of obvious questions suggest themselves: first, can automatic text processing methods be used effectively to replace a manual content analysis; if so, what part or parts of a document should be incor- porated in the automatic procedure; is it necessary to provide vocabulary normalization methods to eliminate ambiguities caused by homographs and synonymous word groups; should such a normalization be handled by means of a specially constructed dictionary, or is it possible to replace thesauruses completely by statistical word association methods; what dictionaries can mdst effectively be used for vocabulary normalization; is it important to provide hierarchical arrangements of subject categories as is done in many library classification systems; what shoula be the role of the user in formulating and controlling the search procedure. These and many other questions are considered in the evaluation process described in the remainder of this section. A) Indexing Depth and Document Length In a manual system, where each information item is identified by a few carefully chosen keywords, the presence or absence of a given keyword becomes of crucial importance, since failure to provide a certain needed keyword may mean the difference between a retrievable item and one which is not. In an automatic text processing system, it is possible to generate for each item many different information identifiers, as seen in F'ig. 2 for the request of Fig. 1; the importance of each individual identifier is then nftlch reduced since a small number of poorly chosen terms are often offset by the much larger number of correct ones.