IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. II-)4O without extrapolation, then this data can be recorded at the ten recall levels on the curve, as was done in Figure 18. D) Extrapolation Techniques for Evaluation of Cluster Searching Experiments on cluster searching, many of which are described in report I.S.R. -12, raise an additional problem when precision recall curves of cluster results are to be averaged. The difficulty arises because, when only certain clusters of documents are searched, rather than the total collection, some of the relevant documents are frequently not examined, so that no rank positions exist for some of the relevant documents. This phenomenon is both an expected and an important one, since this "recall ceiling" is one of the vital factors that is used to evaluate cluster searching. An ideal precision curve that would result from a cluster search averaged over many requests would conuence in the usual manner at the high precision end but would go only as far as the recall ceiling, thus allowing a comparison with the ordinary full search curve only up to that recall ceiling. The problem is reflected in Figure 2[OCRerr] for some hypothetical individual requests, it is seen there that some requests naturally do not reach the average recall ceiling, some exceed it, and others are not included on the plot at all, since no relevant documents at all are found in the cluster search. One solution would be to include in the average curve only those requests which supply some results, so that as the average curve approaches the recall ceiling, it would be based on fewer than the total requests. Other methods can also be suggested which employ extrapolation techniques so that every request enters into the whole of the average curve. The first additional suggested extrapolation technique, has been used exclusively in test results obtained so far with the SMART system. As Figure 25 shows for three individual requests, the recall ceiling reached