IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-53
a hierarchy brings in too many words, and permits combinations of individual
words to be compounded that give no useful grouping for retrieval. A
hierarchy based on, say, the stem dictionary might give better results,
and tests of a hierarchy based on suffix `5' will be made for the Cran-l
collection, but this particular Cran-l hierarchy (constructed at Cranfield)
is very difficult to construct, and it did not perform well at Cranfield.
Since hierarchies are normally based partly on phrases and partly on single
words, any new work in phrase processing would provide a much more inter-
esting environment, in which a hierarchy could be constructed and tested.
The inclusion of a hierarchy within an automatic system does seem to re-
quire the user to examine portions of the hierarchy in relation to their
particular search request, since the many optional uses of hierarchy, such
as "parents", `1sons11 etc. would require some definite pre-search choice
of the relation to be used.
This analysis and discussion of the phrases and hierarchy has
shown that, in their present form, these two types of dictionary do not
improve the thesaurus process by an amount that would justify the effort
required for construction. Indeed, it might even be questioned whether
the effort of constructing a thesaurus itself is worthwhile, since results
such as those given in Figs. 14, 15, and 16 prove that the improvement of
performance in comparison with the stem dictionary is not really large
In situations where economic considerations are all important, or time is
very limited, it seems that an automatic stem dictionary will perform
quite well, particularly for the high precision user. It is disappointing
that the thesauruses tested do not always help the high recall user;