IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Search Matching Functions
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
111-51+
superiority for cosine numeric, but in fact all requests that do better with
other functions do so by very small amounts. Fven if a perfect advance choice
of the best matching function were made for each request, the final result
for the 31+ requests of IRE-3 and the 1+2 of Cran-l would be as given in Figure
35, showing that the final best possible performance is only trivially
superior to the use of cosine numeric for all requests.
[OCRerr]tudies of other matching functions in the context of the SMART
system have been made L2 and Section Iv], but have not been subjected to the
extensive analysis and evaluation made of those reported here; no correlation
coefficient that is superior to cosine has been discovered so far. It is
suggested that some studies of a different type are needed. Some quite basic
questions about the preferred ordering of documents in a ranked output have
not been investigated. For example, using a search request containing five
concepts, is it preferable that the matching function places a document with
four matching concepts all of low weights in front of one with three matching
concepts at high weights? Also, if two documents both match on two equally
weighted request concepts one d[OCRerr]cument having weights of 1 and 3, and the
other weights of 2 and 2, should they both be regarded as equally matched
with the request (as the numerator of cosine would show), or is the second
document perhaps a preferred match?
Questions such as these clearly cannot be answered except in a
given retrieval context. A ?[OCRerr]hand ranking1 study is suggested, in which
persons would be asked to rank documents in relation to search requests in
the order in which they as users w)uld wish to see the documents. The persons
doing the ranking would of course, be given no information as to which
documents were actually judged relevant by the requestor, and the experiment
could be carried out using several permutations of the variations suggested
in Figure 36. The results could be directly evaluated by performance measure-