ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Design Criteria for Automatic Information Systems chapter M. E. Lesk G. Salton Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-]6 Rule 2 : The use of information identifiers which are weighted in accordance with their presumed importance leads to large-scale improvements in retrieval effectiveness, compared with the use of unweighted terms. B) Synonym Recognition One of the perennial problems in automatic language analysis is the question of language variability among authors, and the linguistic ambiguities which result. A large number of experiments have therefore been performed using a variety of synonym dictionaries for each of the three subject fields under study (11Harris 211 and t1Harris 311 dictionaries for the computer literature, [OCRerr] or [OCRerr] lists for aeronautical engineering, and regular thesaurus for documentation). An excerpt of such a synonym dictionary for the computer literature is shown in Fig. 7 for the concept class numbers [OCRerr]8 to [OCRerr]l6. Use of such a synonym dictionary permits the replacement of a variety of related terms by the corresponding concept classes, thus ensuring the retrieval of documents dealing with the 11manufacture of transistor diodes11 when the query deals with the T1production 11 of solid state rectifiers The output of Fig. 8 shows that considerable improvements in perfor- mance are obtainable by means of suitably constructed synonym dictionaries. The improvement is smallest for the Cranfield collection because the dictionary available for this collection was not originally constructed for retrieval purposes. This observation suggests that not all dictionaries are equally useful. Experiments conducted with the S[OCRerr] system lead to the following principles of dictionary construction [13]: