IRE
Information Retrieval Experiment
The methodology of information retrieval experiment
chapter
Stephen E. Robertson
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
30 The methodology of information retrieval experiment
trivial problem, since one must use the results of an initial search without
feedback before trying the feedback procedure. Further, in this case (unlike
the last) there are no obvious solutions to be brought in from outside the field.
There are in fact two methods in use at present: `residual ranking', which
involves removing the documents obtained by the initial search from the
collection (a different set for each query); and `half collection' experiments,
where the initial search is done on one half of the collection and the feedback
is applied to the other half.
But in general, there has not been as much application of experimental
design ideas in retrieval experiments as perhaps there should. This may be in
part to do with the fact that so many of the variables of interest are difficult
to control directly; but we might reasonably expect more such application in
the future.
The limitations of statistics
Following this discussion of statistical ideas, two general points may be
made. First, statistical problems are pervasive in retrieval tests; second,
statistical and other considerations are closely intertwined. The process of
drawing conclusions, of any sort, from the results of a test involves calling on
various ideas, some of a statistical nature and some not; both sets of ideas are
necessary, and they are not easily separable.
Unfortunately, many of the basic statistical problems are difficult ones, not
necessarily solvable in terms of textbook methods; indeed many of them
have not yet been solved. So the extent to which any experimenter can use
formal statistical methods when the situation demands is severely limited.
Experimenters have been in the past, and will continue to be, forced to rely
on ad hoc methods and statistical intuition. I hope, of course, that the
necessary basic work will be done for new methods to be developed; but in
the meantime, I hope that the above discussion will encourage an awareness
of the nature of the problems, as an aid to intuition.
2.5 Conclusions
There is no such thing as a watertight method for evaluating an information
retrieval system.
There is, on the other hand, a considerable battery of methods and
techniques for dealing with the various problems that arise in this endeavour.
Furthermore, each new test throws up new problems, or brings out
inadequacies in traditional solutions. So the archetype I have described is a
fluid concept, which will no doubt change as much in the next twenty years
as it did in the last. If, in 2001, this entire chapter is obsolete, so much the
better!
Bibliographic notes
Barring cross-references to other chapters, the text of this chapter has
deliberately been left without references, in the interests of readability.