ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Iv- 38
and when the stems are weighted in accordance with their frequency within
the document. Furthermore, this process produces high precision if a
less than complete recall performance is desired, because documents
whose word sterns match the stems present in the search requests are
generally found to be useful to the requestor.
B) The Regular Thesaurus
The regular thesaurus provides synonym recognition and may therefore
be expected to be useful in retrieving some documents which cannot be
easily obtained by a word matching procedure alone. The results obtained
with two synonym dictionaries constructed for the computer literature are
sho[OCRerr]m in Fig. 11. The first dictionary, called `1Harris 21, is a thesaurus
constructed by hand using ad hoc methods to group the terms included in
the thesaurus. The other dictionary, termed `tHarris 311, [OCRerr]Ta5 bi[OCRerr]ilt using
the thesaurus construction principles, outlined in the preceding part,
which provide for the isolation of high frequency words and for the
elimination of many words whose information content is unclear. Fig. 11
shows a comparison between the retrieval effectiveness of the full null
thesaurus and the two regular thesauruses previously referred to..
It may be noticed first of all that the performance of the Harris 3
thesaurus is better throughout than that of the Harris 2 dictionary,
thus indicating the effectiveness of the thesaurus construction procedures
compared to ad hoc methods. Fig. ll also indicates that the performance
of the n[OCRerr][OCRerr]l dictionary degrades as the recall values become larger.
Initially, the null thesaurus produces a higher precision than the Harris
2 dictionary, since false retrievals due to questionable synonyms