IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-27
only from a performance analysis viewpoint1 since the combinations of
document lengths (e.g. titles), overlap correlation and logical vectors
are known to be inferior to the regular abstracts cosine nuheric results.
Fig. 18 presents data already given in Figs. 8, 9, and 11, but
here the performance of different versions of essentially the same
dictionary may be compared, the latter version always producing some
improvement.
B) Phrase and Hierarchy Dictionaries
Since both phrase and hierarchy dictionaries are based on the
grouping made within a given thesaurus, performance comparisons will be made
between the thesaurus alone on the one hand1 and the thesaurus used with
either phrases or hierarchy on the other. Using the normalized evaluation
measures, four comparisons involving phrases are given in Fig. 19, and
five comparisons with hierarchy appear in Fig. 20. For the phrase results
in Fig. 19, phrase concept numbers are added to the requests and documents
and given a weight of 1.0, equal to the weight of the original concepts in
requests and documents. Phrases perform better than thesaurus on the
IRE-3 collection, and on ADI, a small improvement for phrases is evident.
With the Cran-l collection, phrases perform. a little worse than the thesaurus..
The hierarchy results in Fig. 20 are based only on one particular series
of relations searched, in which both requests and documents are expanded by
means of the hierarchy, and new concepts added are given a weight of 1.0,
equal to the weight of original concepts in the requests and documents. Fig
shows that use of the "Sons", "Brothers" and "Cross References" relations
in the hierarchy results in a near equivalent, or worse performance than the
20