IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Suffix Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI-20
normalized precision than normalized recall, as the averages show (Figure 1).
But the precision/recall curve is little affected by dictionary change when
averaged over all requests, as Figure 2(b) shows.
A definite conclusion must await an investigation into the effect
of changes in subject language and the effects of differing methods of request
and relevance decision preparation, since both factors are involved in a
comparison of Cran-l with the other two coUections. Meanwhile, the evidence
presented does point to a difference in language characteristics, and tests
on the larger Cran-2 collection of 1400 documents will shed more light on
this.
5. Conclusions
The comparison of the two suffixing dictionaries shows stem to be
superior on the IRE-3 arid ADI coliections, and suffix `5' to be superior on
the Cran-l collection. All differences between dictionaries are small, and
the use of overlap correlation and logical vectors on the IRE-3 and ADI col-
lections lessen the superiority of stem; however, the cosine numeric result
is to be preferred to these procedures. The aerodynamics terminology ap-
pears to offer less opportunity for word conflation than the computer science
and documentation terminologies; this remains the primary explanation so far
discovered for the Cr' an-l result.
Every indication shows that the suffixing dictionaries provide a
convenient and valid base-line from which further dictionaries of the the-
saurus type can be evaluated. However, the use of some type of suffixing
dictionary does provide a good retrieval tool in its own right. Such dic-
tionaries should be considered both as tools that can be constructed with