IRE
Information Retrieval Experiment
The methodology of information retrieval experiment
chapter
Stephen E. Robertson
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Introduction 11
against some standard, or in terms of some criteria of success or failure, and
then look for possible ways of approaching the standard or reducing the
failures. Such a test would be more on the lines of an investigation as defined
in the editor's introduction.
Moving into the sphere of basic research, we may be concerned with the
general principles of information retrieval system design. Again we have the
distinction between experiments designed to help choose between alternative
general principles, and investigations aimed at the discovery of new
principles. Another objective for investigation might be to test the feasibility
of some particular design principle: that is, to test whether a system can be
designed on the basis of such a principle.
All these are good reasons for wanting to test information retrieval systems.
They are also all different, and impose different requirements and constraints
on the conduct of the test. So even before we get down to the pragmatic level
of how best to do things in different circumstances, we find that the question
`How should we test an information retrieval system?' has many answers.
Readers are invited to bear this in mind for the rest of the book!
The archetypal retrieval test
At this point, it is worth describing the general form of a retrieval test, as it
has evolved over the last 20 years. This is not to say that this form is correct,
or that any of the many variants of it are peculiar in any respect; it is merely
to establish a reference point on which to base further discussion.
What constitutes a test of a retrieval system or systems? First of all we
have to have the system itself: that is, the set of rules and procedures, and
human or mechanical operators of these rules and procedures.
Next, we must have the raw material on which the system works: the
documents and requests.
Tests in general, and experiments (in the sense defined) in particular, are
normally intended to answer specific questions. An important component of
any test is the experimental design: that is, the way in which the test is
organized in order to answer the appropriate questions.
All tests involve some kind of measurement, in the widest sense of the
word. In most information retrieval system tests, this includes (among other
things) some form of assessment of the system's response to each query.
Generally, a document collection will contain documents that might have
been useful in the context of a particular query, but which the system does
not retrieve. Many experiments include attempts to discover some or all of
these documents, with a view to assessing the performance of the system
against some standard.
Finally, we must have methods of analysing the results, in such a way as to
allow us to draw the appropriate conclusions, to answer the questions with
which we set out.
All these aspects are discussed further below.
Operational versus laboratory tests
In order to answer a specific question or questions unambiguously, a test
must be designed as far as possible to exclude any extraneous variations