IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. III-~s produces better denominator values as well for the relevant documents (see Figure 30); no explanation for this phenomenon is suggested. Support for the second hypothesis is obtained in Figure 31, where the Cranfield request on ablation is again used as an examplee The two rele- vant documents have poor matches with the request, but since the matching con- cept is the most important request word, weights of [OCRerr] are derived fr[OCRerr] the frequency of occurrence in the document; a non-relevant document with more matching concepts, but spurious ones, is ranked below the relevant with the weights in use. 5. Conclusions and Suggested Further Studies A matching function that consists of the cosine correlation with numeric vectors has been shown to be nearly always superior to the use of either the overlap correlation or logical vectorse A simplified table of results using precision versus recall graphs, for normalized measures, and individual requests is given in Figure 32. The cosine correlation coefficient works better than the overlap coefficient because the factor of document length included in the cosine coefficient reduces the request/document correlation for a number of the highly matched non-relevant documents, since there is a strong correlation among non-relevant documents between number of matching concepts and the length of the document. The superiority of weighted concepts evidenced by the superiority of numeric as opposed to logical vectors is due to two reasons. The first is that highly weighted matching concepts tend to distinguish between important and trivial occurrences of those concepts in the documents, and thus tend to make better distinctions between relevant and non-relevant documents. The second reason is that if different concepts in a request receive different weights, such weighting does discriminate between vital