IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Suffix Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI-l7
This data shows that full suffixing (stem dictionary) does not affect
so many aerodynamics words as it does in computer science and documentation,
thus giving the Cran-l collection less of an opportunity for a change in
retrieval performance. Further explanations can be given by observing in-
dividual request performance for the seven requests that have performance
changes greater than 0e05 no[OCRerr][OCRerr]ized recall (see Figure 6); tour of these
requests are better on stem, and three are better on suffix `5'. An analysis
of the three requests that favor suffix [OCRerr] reveals certain test problems,
connected mainly with hyphenation and keypunch errors, and in fact the request/
relevant document match is in all cases very week. The many requests that
favor suffix `5' by a trivial amount (Figure 6) are typified by request
[OCRerr]269, details of which appear in Figure Il. As Figure U shows, the stem
compress'1 incorrectly matches the request word 11compressor" with a frequently
used word "compressible'1 in two non-relevant documents, so that the stem
dictionary has an interior performance because relevant document 1590 receives
a rank position below the two non-relevant documents. In case the matching
[OCRerr]rds of non-relevant document l98i[OCRerr] appear to put into question the relevance
decision (the abstract includes the topic of 11choked flow in an impeller
inlet 11), the title makes it clear that it is a "centrifugal impeller", and
the matching word "axial" is a qurious match frcm the phrase "axial [OCRerr][OCRerr]etry'1.
The ranks of the seven documents relevant to request Ql90 are given
in Figure 12, because the changes in rank position observed are typical of
what happens to the averages. The highest ranked relevant document remains
unchanged at position 1; thus the high precision end of the curve for this
request will remain unchanged. The next three relevant items are ranked
better by suffix `5', but the final three relevant are ranked better by
stem. This result is seen to give a greater superiority to suffix `8' in