IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-30
using equations (1) and (2) already given;
OVERLAP = 1+1+1+1+2 - 6 -
Mm (12 , 39) - 12 - 0.50
COSINE =
[OCRerr](i +2+1+1+1+3+1+2 )x(i +3+1 +[OCRerr] +2+1+3+7 +i +1+2 +3 +`5 +1+1+1+1+1
- [OCRerr](22X135) = 0.28
It should be noted that the overlap numerator uses the lowest w[OCRerr]ight assigned
to a given matching concept in a request and a document; thus, for requests
in which all the concepts are assigned a weight of 1 only, none o[OCRerr] the weighted
concepts, in the document can exert any influence on the final correlation.
In this situation, which is quite common for the requests tested, the weighted
vector result is then identical to the unweighted result fQr overlap. For
this reason, the con[OCRerr]arison of overlap and cosine was made using unweighted
(logical) vectors; comparisons of the weighted versus unweighted (numeric
versus logical) vectors will use the cosine correlation coefficient.
Weighting achieved by manual or semi-manual decisions may also
become a part of automatic retrieval systems of the future, under the
assumption that such methods do not require large amounts of time and effort,
and give useful improvements in performance. An example of this, using selec-
tive request weighting to improve vital request notions, is given in section
VIII.
B) Retrieval Per[OCRerr]ormance Results
Thirteen comparison runs on three collections are presented in
Figure 17, evaluated by nox[OCRerr]nalized recall and normalized precision. All IRE-3
results show numeric to be superior to logical, and most the runs on Cran-1