ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv- 38 and when the stems are weighted in accordance with their frequency within the document. Furthermore, this process produces high precision if a less than complete recall performance is desired, because documents whose word sterns match the stems present in the search requests are generally found to be useful to the requestor. B) The Regular Thesaurus The regular thesaurus provides synonym recognition and may therefore be expected to be useful in retrieving some documents which cannot be easily obtained by a word matching procedure alone. The results obtained with two synonym dictionaries constructed for the computer literature are sho[OCRerr]m in Fig. 11. The first dictionary, called `1Harris 21, is a thesaurus constructed by hand using ad hoc methods to group the terms included in the thesaurus. The other dictionary, termed `tHarris 311, [OCRerr]Ta5 bi[OCRerr]ilt using the thesaurus construction principles, outlined in the preceding part, which provide for the isolation of high frequency words and for the elimination of many words whose information content is unclear. Fig. 11 shows a comparison between the retrieval effectiveness of the full null thesaurus and the two regular thesauruses previously referred to.. It may be noticed first of all that the performance of the Harris 3 thesaurus is better throughout than that of the Harris 2 dictionary, thus indicating the effectiveness of the thesaurus construction procedures compared to ad hoc methods. Fig. ll also indicates that the performance of the n[OCRerr][OCRerr]l dictionary degrades as the recall values become larger. Initially, the null thesaurus produces a higher precision than the Harris 2 dictionary, since false retrievals due to questionable synonyms