ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-12
be used. On the other hand if only a few items are to be retrieved, but
the user insists that these items must be relevant, then the specific
thesaurus categories will prove more useful. This then confirms the well-
kno[OCRerr]m fact that any kind of retrieval tool must be constructed [OCRerr]rith the
retrieval environment in m[OCRerr]nd in which it is expected to operate.
Concerning now the problem of where a given term is to be put within
a given thesaurus organization, this depends largely on the type of user
which may be expected to avail himself of the retrieval systems. As an
ex&mple, dictionaries constructed for a population of students may be
expected to require an organization somewhat different from that which would
be useful to advanced research scientists. The latter mignt, for exa[OCRerr]aple,
be interested in the specific physical characteristics of certain devices,
whereas the former are r[OCRerr]ore interested in the uses of the devices. A t1tran-
siztorY? could then appear in a category under t1three terminal switching
devicest, if the users were to be engineers, but it would appear under
computer components?, for a user population consisting of computer program-
mers.
The following principles of thesaurus construction may then be
enunciated:
1) no very rare concepts should be included in the thesaurus since
these could not be expected to produce many matches between
documents and search requests;
2) very common high frequency terms should also be e[OCRerr]cluded from the
dictionary, since these produce too many matches for effective
retrieval (it is in fact possible to replace individual high
frequency terms by much more specific compound or hyphenated
terms; for example, terms such as 11computer'1 or ?? control11 might