ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-[OCRerr]7
dictionary. Such a committee produced standard frequently ends by
satisfying no one, despite the enormous effort which goes into its con-
struction.
Clearly, if it were necessary to follow this particular pattern in
order to build a useful dictionary for retrieval purposes, then any saving
which might result from automatic search and retrieval methodology would
promptly be lost through the elaborate preparations required to build
dictionaries.
This situation has led to many efforts calculated to produce dictionaries
either fully-automatically, or in any case by more systematic procedures
than a committee-controlled process. Any reasonably standardized method
for dictionary construction not only saves time and decreases costs, but
also permits a great deal more latitude in the type of retrieval procedures
which can be implemented. The following principal advantages are evident:
1) the retrieval procedures can be extended to collections in many
different areas, since the dictionary problem no longer consti-
tutes an impediment;
2) it becomes possible to investigate differences in vocabulary
between different subject areas, notably the frequently heard
assertion that the vocabulary in some subject areas is It50fttl
(that is, not well standardized and ambiguous), whereas in other
areas it is I[OCRerr]hard?v;
3) it removes any possible differences in retrieval effectiveness
between different subject areas due to disturbances introduced
by varying methods of thesaurus construction;
4) it becomes possible to investigate the retrieval effectiveness
of a variety of thesauruses for a given collection, including
variations in the thesaurus size, in the number of concept
classes, and in the correspondents assigned to each class.