Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. II-[OCRerr]3 by the search results (o.[OCRerr] in case[OCRerr] b) and c)) is extrapolated linearly to the 1.0 recall points, using the precision gained in the full search. Since the full search curve is drawn by the Quasi-Cranfield cut-off method, this means that cluster results are extrapolated to the precision achieved by the last relevant document in the full search. Figure 25 a) shows what happens to a cluster result in which no relevant documents at all are found: using the left-end extrapolation method recommended in part 14C, the whole cluster curve is an extrapolation from the chosen point at 1.0 recall in the full search curve. Extrapolation could also be done by assigning to those relevant documents not found in the cluster search a random rank position, bounded by the rank of the last document recovered by the cluster search and the total collection size. It would be feasible also to extrapolate by use of the precision achieved if the relevant documents not found were ranked in the worst possible positions, that is, assuming that recall 1.0 is obtained only as the last document in the collection is examined. A further suggestion is to make use of the full search curve before it reaches 1.0 recall, and use some method of joining the end of the cluster curve to some point along the full search curve. No comparison of these methods has yet been made, since the technique in use is conceptually as satisfactory as any of the other suggestions. 5. Measures for Varying Relevance Evaluation Although the rendering of relevance decisions is a task quite separate from the considerations which go into the construction of performance measures reflecting system effectiveness, it may be [OCRerr][OCRerr]sirable to use performarce measures based on grades of relevance rather than on mary decision of I?relevantll or "non relevant' alone. The performance characteristic curve suggested by Giuliano and Jones [8] is designed to use spectra of relevance, since in