IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-51 and unimportant request notions so that in same cases otherwise similarly matched relevant and non-relevant documents are correctly separated. These conclusions are based on the average results observed for three sets of more than 30 requests; since the more detailed results presented show that not all the individual requests perform best with cosine numeric, it must be asked whether cosine numeric should be exclusively used for automatic retrieval systems. It might be possible to develop methods to predict, in advance of the search of a request, the matching function that would perform best, assuming that several optimal functions were provided. Analysis has shown that accurate advance prediction by automatic means is very difficult: the request sets in use have been divided by several criteria, such as length of request measured, number of request concepts, and generality of request, measured by number of documents assessed as relevant in the collection, but no correlation between these criteria and the best matching function has been discovered. Advance prediction of the most suitable matching function might be obtained from individual users: for example, users with very precisely stated needs who wish to examine only those documents containing all their stated request notions might be best satisfied by an overlap correlation, with or without weighting scheme. Users who supply many possible words to define their general area of interest might be best satisfied with the cosine numeric function. However, two sets of results examined suggest that provision of the best matching function only (cosine numeric) could provide acceptable results for a majority of users Figure 33 shows precision versus recall graphs for the IRE-3 and Cran-l collections, comparing the four possible combinations of cosine, overlap, numeric and logical. Figure 3'4 shows that in the Cran-l results 66.7% to 78.6% of the requests prefer cosine numeric to any of the other three functions. The IRE-3 collection shows less of a