IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
II-)4O
without extrapolation, then this data can be recorded at the ten recall levels
on the curve, as was done in Figure 18.
D) Extrapolation Techniques for Evaluation of Cluster Searching
Experiments on cluster searching, many of which are described in
report I.S.R. -12, raise an additional problem when precision recall curves
of cluster results are to be averaged. The difficulty arises because, when
only certain clusters of documents are searched, rather than the total
collection, some of the relevant documents are frequently not examined,
so that no rank positions exist for some of the relevant documents. This
phenomenon is both an expected and an important one, since this "recall
ceiling" is one of the vital factors that is used to evaluate cluster searching.
An ideal precision curve that would result from a cluster search averaged over
many requests would conuence in the usual manner at the high precision end
but would go only as far as the recall ceiling, thus allowing a comparison
with the ordinary full search curve only up to that recall ceiling.
The problem is reflected in Figure 2[OCRerr] for some hypothetical individual
requests, it is seen there that some requests naturally do not reach the
average recall ceiling, some exceed it, and others are not included on the
plot at all, since no relevant documents at all are found in the cluster
search. One solution would be to include in the average curve only those
requests which supply some results, so that as the average curve approaches
the recall ceiling, it would be based on fewer than the total requests.
Other methods can also be suggested which employ extrapolation techniques
so that every request enters into the whole of the average curve.
The first additional suggested extrapolation technique, has been
used exclusively in test results obtained so far with the SMART system.
As Figure 25 shows for three individual requests, the recall ceiling reached