ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-1[OCRerr]
4. Cutoff-Independent Performance Indices
A. Derivation
Performance indices for document retrieval systems which are
based on a contingency table description (as introduced in section 2)
assume hat a retrieval operation partitions the reference collection,
i.e. identifies a retrieved subset. in the model system considered
in this thesis (vector indexing and cosine correlation 4uery-document
matching), and in other models of interest (see Chapter 4), the result
of a retrieval operation is more accurately described by the
distribution of the matching coefficient over the reference collection
or by the ordering induced on the document set from this distribution.
The use of partition based evaluation parameters for such systems
requires, then, that some decision function (or cutoff criterion) be
introduced into the retrieval process. Operationally, the number of
retrieved documents a user will examine is likely to be dependent on
a number of subjective variables. There is, therefore, considerable
difficulty in the a priori specification of a meaningful partitioning
algorithm. For this reason, then, some performance measures are
derived here which are functionally dependent on the full ordering of
the reference collection produced by a retrieval operation. Such
measures eliminate the need to introduce any notion of cutoff.
Under the assumption that the ordering induced on the set of
reference documents by the search process M is the principal result
of a retrieval operation and that a set of relevant documents D is
R
available corresponding to each request q, the objective of a