IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
II-~
areas covered by a set of requests. No suitable method of achieving this
type of comparison has yet been developed, but it is crucial to further
research in this area because clearly some coliections are more hostile
to a good retrieval performance simply because these contain a large number
of potentially retrievable non-relevant items.
Four desirable properties of retrieval performance measures are
suggested by John Swets [3], namely that the measure should be:
- able to measure retrieval effectiveness alone, separately from
other criteria such as cost;
- independent of any particular cut-off;
- a single number;
- on a number scale to give absolute and relative values.
Swets, however, does not recognize the possibility that different purposes
and measurement viewpoints may be important, and the resulting measure
proposed takes no account of the user viewpoint in a directly meaningful
way. From matters discussed already, several other properties appear
desirable:
- ability to reflect success of system in meeting needs of
different types, such as high precision, or high recall;
- ability to interpret measures directly in terms of a u[OCRerr]er's
experience: for example, 0.2 precision at 0.5 recall means
that the user has examined half the relevant documents available,
while at the same time four non-relevant document items were
looked at for every one relevant;
- ability to compare systems of differing generality.
Other properties can be suggested, but the purposes and viewpoints here
suggested should override such properties as the `tsingle number" or
"absolute and relative scales", which are desir[OCRerr]1e perhaps but not essen-
tial. The purposes, viewpoints and properties discussed are summarized in
Figure 2.