IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-9
3. SMART Test Results Matching F\:lnctions
A) Description of Functions
Retrieval runs made on the SMART system have concentrated on the
use of two optional matching functions, known as the overlap correlation coef-
ficient and the cosine correlation coefficient. The task of matching search
requests with the documents in the file is viewed in SMART as a vector simi-
larity problem. The individual elements used in the document and request
vectors are the individual content identifiers usually referred to as concepts
or concept numbers. For tests comparing the two matching functions, binary
vectors are used, in which concepts are either present or absent from a
vector; if present, all exert equal weight in the functions.
Since search requests and documents are considered to be simply
strings of concept numbers, with no logical relations of the type used in
manually formulated searches linking the concepts, only three primary types
of data may be incorporated in the matching function:
a) the number of concepts in the request;
b) the number of concepts in the document;
c) the nu[OCRerr]ber of concepts that [OCRerr]e found both in the request and
in the document, i.e. the matching concepts.
The number of matching concepts is used in matching functions of all types,
with cosine using both the request and document concepts, and overlap either
the request or document concepts, whichever has the smaller total number.
As an example illustrating the two functions, a document vector
(b) represented by 18 concept numbers is to be matched against a request
vector (a) represented by 8 concept numbers, where 5 concept numbers match: