IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-9 3. SMART Test Results Matching F\:lnctions A) Description of Functions Retrieval runs made on the SMART system have concentrated on the use of two optional matching functions, known as the overlap correlation coef- ficient and the cosine correlation coefficient. The task of matching search requests with the documents in the file is viewed in SMART as a vector simi- larity problem. The individual elements used in the document and request vectors are the individual content identifiers usually referred to as concepts or concept numbers. For tests comparing the two matching functions, binary vectors are used, in which concepts are either present or absent from a vector; if present, all exert equal weight in the functions. Since search requests and documents are considered to be simply strings of concept numbers, with no logical relations of the type used in manually formulated searches linking the concepts, only three primary types of data may be incorporated in the matching function: a) the number of concepts in the request; b) the number of concepts in the document; c) the nu[OCRerr]ber of concepts that [OCRerr]e found both in the request and in the document, i.e. the matching concepts. The number of matching concepts is used in matching functions of all types, with cosine using both the request and document concepts, and overlap either the request or document concepts, whichever has the smaller total number. As an example illustrating the two functions, a document vector (b) represented by 18 concept numbers is to be matched against a request vector (a) represented by 8 concept numbers, where 5 concept numbers match: