IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vil-lo
search requests is quite commonplace in document retrieval. In addition,
the words grouped in the thesaurus dictionary may display hierarchical
relationships; for example, concept 22 of the Cran-2 Thesaurus-3 groups
both "algebra" and "arithmetic" with the generic notion of "mathematics"
(Fig. 2). Hierarchy dictionaries tested have been constructed by struc-
turing the thesaurus concepts themselves, rather than by going back to
the separate words or word stems. Hierarchies have been manually con-
structed only for the IRE Computer Science collection, and descriptions
of the methods used in their construction have appeared in [2,3,5,8,13,14].
Discussion and evaluation of procedures for automatically producing hier-
archies by co-occurrence statistics is also not considered here (see [2,15]).
5. Retrieval Performance Results
A) Thesaurus Dictionaries
Performance comparisons are normally made between the stem and
thesaurus dictionarie6, and a series of comparisons using normalized recall
and precision are given in Figs. 5, 6, and 7. The results in Fig. 5 are all
based on the cosine numeric matching function, and it may be seen that even
with different document input lengths,the thesaurus dictionaries are nearly
always[OCRerr] superior to stem. Reasonable explanations can be found for two main
exceptions, since the Cran-l Thesaurus-i was made without the use of any of
the construction rules; furthermore, it was based on the indexing only,
omitting many words which appeared in the abstracts. The second exception
is the ADI "Hastie" Thesaurus-SAl which was made by semi-automatic procedures
and was known to contain unsatisfactory groupings. Figs. 6 and 7 give,
respectively, some results based on the Cosine Logical and Overlap Logical