ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-12 be used. On the other hand if only a few items are to be retrieved, but the user insists that these items must be relevant, then the specific thesaurus categories will prove more useful. This then confirms the well- kno[OCRerr]m fact that any kind of retrieval tool must be constructed [OCRerr]rith the retrieval environment in m[OCRerr]nd in which it is expected to operate. Concerning now the problem of where a given term is to be put within a given thesaurus organization, this depends largely on the type of user which may be expected to avail himself of the retrieval systems. As an ex&mple, dictionaries constructed for a population of students may be expected to require an organization somewhat different from that which would be useful to advanced research scientists. The latter mignt, for exa[OCRerr]aple, be interested in the specific physical characteristics of certain devices, whereas the former are r[OCRerr]ore interested in the uses of the devices. A t1tran- siztorY? could then appear in a category under t1three terminal switching devicest, if the users were to be engineers, but it would appear under computer components?, for a user population consisting of computer program- mers. The following principles of thesaurus construction may then be enunciated: 1) no very rare concepts should be included in the thesaurus since these could not be expected to produce many matches between documents and search requests; 2) very common high frequency terms should also be e[OCRerr]cluded from the dictionary, since these produce too many matches for effective retrieval (it is in fact possible to replace individual high frequency terms by much more specific compound or hyphenated terms; for example, terms such as 11computer'1 or ?? control11 might