IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. III-[OCRerr]7 Examining Figures 26, 27, and 28, the following individual cases may be noted: 1. Relevant documents that are long receive low ranks on logical cosine unless very highly matched. The great length of document 797 is offset on cosine numeric by the very high weights associated to the matching concepts. 2. Relevant documents having few matching concepts that are ranked below certain higher-matching non-relevant documents with cosine logical receive improved ranks on numeric cosine when matching con- cepts are highly weighted (see documents 1420 and 7914). When matching concepts of relevant documents are not highly weighted, the numeric measure usually worsens their rank positions (see documents 793, 795 and 796). From this data two hypotheses emerge: First, if a relevant and a non-relevant document have similar numbers of matching concepts or similar rank positions using logical cosine, the introduction of weights will on average result in higher matches for the relevant than the non-relevant documents. It seems reasonable that low weighted matching concepts should have a higher probability of reflecting a trivial occurrence of those concepts in the docu- ment than is the case for concepts that are highly weighted. The second hypothesis is that weights assigned to the matching concepts provide some measure of discrimination between concepts aceording to their importance; this discrimination is of value in matching relevant docu- ments. In such cases spurious matches with many concepts are distinguished from correct matches even if obtained with fewer concepts. Evidence that the first hypothesis holds for request 0137 is given in Figure 29, showing that the change from logical to numeric produces far better cosine correlation values in the numerator for relevant documents compared with the non-relevant documents. In this example, numeric also