IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-51
and unimportant request notions so that in same cases otherwise similarly
matched relevant and non-relevant documents are correctly separated.
These conclusions are based on the average results observed for three
sets of more than 30 requests; since the more detailed results presented
show that not all the individual requests perform best with cosine numeric,
it must be asked whether cosine numeric should be exclusively used for automatic
retrieval systems. It might be possible to develop methods to predict, in
advance of the search of a request, the matching function that would perform
best, assuming that several optimal functions were provided. Analysis has
shown that accurate advance prediction by automatic means is very difficult:
the request sets in use have been divided by several criteria, such as length
of request measured, number of request concepts, and generality of request,
measured by number of documents assessed as relevant in the collection, but
no correlation between these criteria and the best matching function has
been discovered.
Advance prediction of the most suitable matching function might be
obtained from individual users: for example, users with very precisely stated
needs who wish to examine only those documents containing all their stated
request notions might be best satisfied by an overlap correlation, with
or without weighting scheme. Users who supply many possible words to define
their general area of interest might be best satisfied with the cosine numeric
function. However, two sets of results examined suggest that provision of
the best matching function only (cosine numeric) could provide acceptable
results for a majority of users Figure 33 shows precision versus recall
graphs for the IRE-3 and Cran-l collections, comparing the four possible
combinations of cosine, overlap, numeric and logical. Figure 3'4 shows that
in the Cran-l results 66.7% to 78.6% of the requests prefer cosine numeric
to any of the other three functions. The IRE-3 collection shows less of a