<DOC> 
<DOCNO> IRS13 </DOCNO>         
<TITLE> Scientific Report No. IRS-13 Information Storage and Retrieval </TITLE>         
<SUBTITLE> Search Matching Functions </SUBTITLE>         
<TYPE> chapter </TYPE>         
<PAGE CHAPTER="3" NUMBER="30">                   
<AUTHOR1> E. M. Keen </AUTHOR1>  
<PUBLISHER> Harvard University </PUBLISHER> 
<EDITOR1> Gerard Salton </EDITOR1> 
<COPYRIGHT MTH="December" DAY="" YEAR="1967" BY="National Science Foundation">   
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 
</COPYRIGHT> 
<BODY> 
 111-30

 using equations (1) and (2) already given;


 OVERLAP = 1+1+1+1+2 - 6         -
             Mm  (12 , 39)   - 12 - 0.50

 COSINE  =


[OCRerr](i +2+1+1+1+3+1+2 )x(i +3+1 +[OCRerr] +2+1+3+7 +i +1+2 +3 +`5 +1+1+1+1+1


         - [OCRerr](22X135) =     0.28

 It should be noted that the overlap numerator uses the lowest w[OCRerr]ight assigned

 to a given matching concept in a request and a document; thus, for requests

 in which all the concepts are assigned a weight of 1 only, none o[OCRerr] the weighted

 concepts, in the document can exert any influence on the final correlation.

 In this situation, which is quite common for the requests tested, the weighted

 vector result is then identical to the unweighted result fQr overlap.   For

 this reason, the con[OCRerr]arison of overlap and cosine was made using unweighted

 (logical) vectors; comparisons of the weighted versus unweighted (numeric

 versus logical) vectors will use the cosine correlation coefficient.

          Weighting achieved by manual or semi-manual decisions may also

 become a part of automatic retrieval systems of the future, under the

 assumption that such methods do not require large amounts of time and effort,

 and give useful improvements in performance.  An example of this, using selec-

 tive request weighting to improve vital request notions, is given in section

 VIII.


          B)  Retrieval Per[OCRerr]ormance Results

          Thirteen comparison runs on three collections are presented in

 Figure 17, evaluated by nox[OCRerr]nalized recall and normalized precision.  All IRE-3

 results show numeric to be superior to logical, and most the runs on Cran-1

</BODY>                  
</PAGE>                  
</DOC>