IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IX-2i
is shown at the bottom. It is observed that the associative process
removes relevant documents from ranks 10-29, but not 30-99. It also
removes documents (but many fewer) from the very low rank positions.
The documents in ranks 10-29 which are promoted up to ranks 1-9 are
generally those which have had their significant terms upweighted by
the process indicated above. The majority of documents promoted from
ranks 100-200 had no significant matching terms before associations
were added. Of 6 relevant documents which move up over 100 rank
positions, all were improved by the recall effect. But in the 10-29 groups,
which lost a total of 38 relevant documents, 25 going up and 15 down,
nearly every case of improvement is due to the precision effect. This
represents a significant improvement in the performance of the relevant
documents in this range. In fact, the largest change of ranks in the
entire table is the promotion of 16 relevant documents from the 10-19 range
to the 1-9 range. In the 100-200 range, a total of 14 documents were
promoted out of this range, while 11 were dropped down into it. But it
should be noted that if the range 20-99 is considered, as many documents
(10) are dropped from these ranges to the 100-200 range as are promoted
to them. The net loss of documents from the 100-200 range is due entirely
to the 4 promotions to rank positions 1-20, all caused by the recall effect.
Another way of describing this effect is to note, for each rank
position range, the number of relevant documents promoted as against the
number of relevant documents demoted. This is shown in Table 7. Clearly,
the system operates well near the 20-29 range; and then again at the very
bottom. The last figure in the table is somewhat inflated, since if a docu-
ment is already near the bottom, it is difficult to demote if further.