IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-10 Mm (a , b) OVERLAP = ___ Min( [OCRerr]a , [OCRerr]b) 5 5 - Min(8 , 18) = = 0.63 (i) COSINE = Za.b w-1b2 = 1% = 0.42 (2) Both functions are designed for use with weighted concept numbers, and their use in this mariner is illustrated in part 4. Since in the tests carried out, the requests are generally shorter than the documents (except occasionally when title runs are being made), the overlap function in documentary terms measures the inclusion of the request terms in the document only. Thus, if a request with eight concepts matches five of them in several documents, all such documents will receive identical correlations with the request. The cosine function measures the similarity of the total request to the total document, and non-matching concepts in both requests and documents affect the final correlation. Thus, for a request that matches five out of eight concepts in several documents, the document that has the fewest number of non-matching concepts will receive the highest correlation. Cosine thus takes into account document length, following the principle that if two documents have equal request/document matching concepts, the shorter document has a higher proba- bility of being useful to the requestor, since it will contain less extraneous material. In documentary terms this principle seems of doubtful validity since a requestor may be equally satisfied by treatment of the requested topic in a long document as in a short one.