IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-51+ superiority for cosine numeric, but in fact all requests that do better with other functions do so by very small amounts. Fven if a perfect advance choice of the best matching function were made for each request, the final result for the 31+ requests of IRE-3 and the 1+2 of Cran-l would be as given in Figure 35, showing that the final best possible performance is only trivially superior to the use of cosine numeric for all requests. [OCRerr]tudies of other matching functions in the context of the SMART system have been made L2 and Section Iv], but have not been subjected to the extensive analysis and evaluation made of those reported here; no correlation coefficient that is superior to cosine has been discovered so far. It is suggested that some studies of a different type are needed. Some quite basic questions about the preferred ordering of documents in a ranked output have not been investigated. For example, using a search request containing five concepts, is it preferable that the matching function places a document with four matching concepts all of low weights in front of one with three matching concepts at high weights? Also, if two documents both match on two equally weighted request concepts one d[OCRerr]cument having weights of 1 and 3, and the other weights of 2 and 2, should they both be regarded as equally matched with the request (as the numerator of cosine would show), or is the second document perhaps a preferred match? Questions such as these clearly cannot be answered except in a given retrieval context. A ?[OCRerr]hand ranking1 study is suggested, in which persons would be asked to rank documents in relation to search requests in the order in which they as users w)uld wish to see the documents. The persons doing the ranking would of course, be given no information as to which documents were actually judged relevant by the requestor, and the experiment could be carried out using several permutations of the variations suggested in Figure 36. The results could be directly evaluated by performance measure-