IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-l
VII. Thesaurus, Phrase and Hierarchy Dictionaries
E. M. Keen
1. Introduction
The suffix removal procedures described in Section VI provide
synonym control only when identical word stems are involved; any compre-
hensive synonym and partial synonym recognition requires a procedure that
groups words according to synonymy irrespective of word spelling. For this
reason, the use of dictionaries of the thesaurus type is being investigated,
as well as the use of phrases rather than single words, and also the use
of word relations as specified by hierarchical arrangements. The construc-
tion characteristics of several dictionaries are discussed in the present
section, before retrieval runs are presented, using retrieval results for
three document collections.
2. Description of Thesaurus Dictionaries
Seven thesaurus dictionaries are currently available, and each is
referred to as follows:
1. I[OCRerr]-3 Thesaurus-2. Known also as the "Harris 2" thesaurus, this
handmade dictionary was originally constructed for use specifically
with the IRE-l collection.
2. IRE-3 Thesaurus-3. Known also as the "Harris 3" thesaurus, this
handmade dictionary was constructed for use with any collection
of computer science documents, and was first tested on the IRE-2
collection.
3. CRAN-l Thesaurus-l[OCRerr] Known also as the "Old Quasi-Synonym"
dictionary, this is a modified manually-constructed version of