ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
precision, the dictionary without phrases is preferable. This result
reflects the feeling, already expressed in connection with the null
thesaurus, that the first few documents are best retrieved by the
simplest possible methods, when the chances of erroneous analysis are
smallest. The statistical phrase procedure, as well as the regular
thesaurus look-up, may always generate an occasional concept which is
in error. Such concepts may affect the retrieval results, thus depressing
precision. On the other hand, the increasingly more sophisticated text
analysis which becomes possible through the phrase detection procedure
is undoubtedly responsible for retrieving at least some documents which
cannot be brought to the su[OCRerr]face by other simpler rr.ethods. This accounts
for the beneficial effect of all well-built dictionaries in improving
*
the recall performance, usually at a loss in precision.
The observed usefulness of synonym and phrase dictionaries raises
the important question of how such dictionaries are best prepared. This
question is examined in more detail in the next part.
5. Automatic Thesaurus Construction
Under normal circumstances, the task of constructing a subject dictio-
nary for a given topic area is one which demands many skills, including
also a great deal of persistence and tenacity. It is not usually enough
to be a subject expert in a given area, but training is also normally
expected in linguistics and philosophy. Furthermore, since the task is
of large proportions, a committee is often appointed which thrashes out
controversial questions and eventually produces a suggested standard
*
The search results exhibited in this report for documents and dictionaries
in the computer literature have been confirmed for other subject areas,
including aeronautical engineering and documentation, also processed
with the SMART programs.