Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Suffix Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-20 normalized precision than normalized recall, as the averages show (Figure 1). But the precision/recall curve is little affected by dictionary change when averaged over all requests, as Figure 2(b) shows. A definite conclusion must await an investigation into the effect of changes in subject language and the effects of differing methods of request and relevance decision preparation, since both factors are involved in a comparison of Cran-l with the other two coUections. Meanwhile, the evidence presented does point to a difference in language characteristics, and tests on the larger Cran-2 collection of 1400 documents will shed more light on this. 5. Conclusions The comparison of the two suffixing dictionaries shows stem to be superior on the IRE-3 arid ADI coliections, and suffix `5' to be superior on the Cran-l collection. All differences between dictionaries are small, and the use of overlap correlation and logical vectors on the IRE-3 and ADI col- lections lessen the superiority of stem; however, the cosine numeric result is to be preferred to these procedures. The aerodynamics terminology ap- pears to offer less opportunity for word conflation than the computer science and documentation terminologies; this remains the primary explanation so far discovered for the Cr' an-l result. Every indication shows that the suffixing dictionaries provide a convenient and valid base-line from which further dictionaries of the the- saurus type can be evaluated. However, the use of some type of suffixing dictionary does provide a good retrieval tool in its own right. Such dic- tionaries should be considered both as tools that can be constructed with