IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-17
Comparisons of individual request merit are given in Figure 8, where
58.8% to 76.5% of the requests favor cosine on IRE-3, 70.0% to 85.7% favor
cosine on Cran-l, and 50% to 80% favor cosine on ADI. The use of the thesaurus
dictionary, which gives a performance superior to stem on all collections,
also shows good superiority for cosine.
C) Analysis of Performance
Since cosine consistently performs better than overlap, an adequate
explanation must be sought. It should be noted that although the average
results show cosine to be superior by only around 5% in precision, individual
requests and relevant documents may show large changes in favor of both cosine
and overlap. Figures 9 and 10 show results which strongly favor both cosine
and overlap, respectively, for two individual requests. The individual
relevant documents display large changes in rank with change in correlation
coefficient. Using the Cran-l Stem results, it is found that of 198 documents
relevant to all 42 requests, 95 show rank improvements on cosine over over-
lap, 62 show the reverse improvement, and [OCRerr]l show no change in rank. Figure
11 shows the amounts of change in rank for the 95 and 62 documents, revealing
that the advantage is, as expected, with cosine.
Figure 12 gives a diagrammatic representation of what is happening
when the ranking induced by overlap is changed to cosine. Overlap orders
documents by match alone; to simplify the diagram five matching strengths only
are recorded. At each matching strength both relevant and non-relevant
documents may be found intermingled. If the ordering induced by cosine is
now imposed on the documents, two types of changes take place. Ffrstly,
some non-relevant documents are decreased in match and rank position, and
some relevant documents are increased in match and rank position; naturally,
such changes favor cosine as opposed to overlap. Secondly, the changes that