IRE
Information Retrieval Experiment
The methodology of information retrieval experiment
chapter
Stephen E. Robertson
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Statistical ideas and questions 23
Collections are normallycommunicated in machine-readable form (on tape);
documents are usually available as the texts of abstracts, and/or some form
of index representation.
The existence of these collections has had a considerable influence on the
direction of research in the field, for the simple reason that some processes
(such as automatic indexing from full text) are not possible on these
collections as they currently exist. In these circumstances, it is at least
urguable that the research community should set up one or more genuinely
portable test collections: collections that are designed as general-purpose
research tools, rather than taking on that role by accident. Although some
work has been done in the last few years on the desirable characteristics of a
portable test collection, no such collection has been built. But this is clearly
`I direction in which future laboratory work in document retrieval might
move.
2.4 Statistical ideas and questions
Why statistics?
A test of a retrieval system necessarily involves, as we have seen, some kind
I)f measurement (in a general sense of the word) of certain aspects of the way
itie system works. But this information about the system is of necessity
tiistorical[OCRerr]it concerns acts of retrieval which have already happened. The
)Illy ultimate reason for testing a retrieval system must be to discover or infer
[OCRerr]()mething about future acts of retrieval, either in the sense of future requests
I)tit to the same system, or in the sense of general principles (from which
l).Irticular deductions about the future might be made). Such inferences are
lie subject-matter of statistics.
More particularly, having performed a comparison of two systems on
[OCRerr]pccific samples of documents and requests, we may be interested in the
[OCRerr]t[OCRerr]ttistical significance of the difference, that is in whether the difference we
observe could be simply an accidental property of the sample or can be
.`t.[OCRerr]sumed to represent a genuine characteristic of the populations. Further, we
Ifl([OCRerr]y want to enlist the aid of statistical methods in discovering the underlying
reasons for what we observe.
We can illustrate the peculiar difficulty of applying statistical methods to
`uformation retrieval test data by first describing an unrealistically simple
[OCRerr]ituation. The rest of this chapter is devoted to an examination of the
underlying problems that emerge as we try to deal with reality. More concrete
iccommendations and suggestions are provided by Tague in Chapter 5.
A simple case
([OCRerr]onsider the case of an operational test which is designed to decide between
IWO existing alternative systems, for a particular collection of documents and
.1 particular clientele. Assume further that (a) the collection of documents is
[OCRerr]()mplete, and will not be added to or changed in the future, and (b) the
characten sties of the clientele, and of the kinds of requests that they make,
will not change in the future. Then we have a reasonably good case from the
l)Oint of view of statistics; if we use a random sample of the incoming