ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-2 dictionaries may also exist, containing terms or categories which should not be used for purposes of information identification. In view of the importance of the initial information analysis and classification - all later search and. retrieval operations are of course of no avail in the absence of a careful and consistent determination of information content - it is appropriate to examine in detail the probl[OCRerr]ms connected with the generation and use of dictionaries. Accordingly, the present study specifies the form of a variety of dictionaries which have been found useful in information analysis, and examines some of the principles of dictionary construction. Im[OCRerr]phasis is placed on those dictionaries which can be used for natural lan[OCRerr][OCRerr]ge analysis, since many of the information items and of the search requests to be stored may be expected to be expressed by words or word strings in the natural language. Performance characteristics are given, based on search results obtained with various dictionaries, and several methods are suggested for the constr'[OCRerr]ction of dictionaries by semi- automatic means. 2. Language Analysis Consider the problem of taking a document or search request in the natural language, and of attempting to use some automatic procedure to generate content identifications for the input texts. Such a task immediately raises many difficulties brought about by the complexity of the language, and by the irregularities which govern the syntactic and semantic structure. The following principal problems must be dealt with [1]: 1) words which carry out syntactic functions but which do not contribute directly to the specification of information content