IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
III-~0
range of weights in the vectors, that is, combinations of words may cause
some concepts to be very highly weighted, and rare examples of weights in
excess of 100 have been noted.
The superiority of numeric on Cran-l Stem, which was left in some
doubt on the average measures (although not in the individual request figures)
is only marginally established by looking at the 198 individual relevant docu-
ments separately. Some [OCRerr]l of the documents have identical ranks on numeric
and logical, while 85 have better ranks on numeric, and 72 have better ranks
on logical. The 85 that are better on numeric show larger increases in the
rank positions involved, as shown in Figure 25.
These large scale changes that work in both directions, some favoring
numeric and some logical, are illustrated for one individual request by the
data in Figure 26. Six of the ten highest ranked documents on logical receive
rank positions below 1.0 in numeric; this large change in document ranks
favors numeric in this example. In order to determine how the weighting
scheme 18 used to achieve a more effective discrimination between relevant and
non-relevant documents, further data on these 17 documents are given in
Figures 27, 28, 29 and 30.
Figure 27 shows the ordering resulting from logical (cosine), giving
correlations, matching concepts and total document concepts. Figure 28
gives the numeric ordering, together with data about the matching concepts
and document concepts from which the final correlation is derivede For
example, the correlation given to document [OCRerr]20 (relevant) of 0.41421 is
derived from:
Cosine Numeric Correlation =
Sum of matching concept doct. weights
~estwei~
`Isum of squares[OCRerr] [OCRerr] tsum of squares
of req. wts.J [OCRerr]of doct. wts.
- 0.4421
1158433x1235,424