IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
70 The pragmatics of information retrieval experimentation
experimenter may have to develop procedures for obtaining cost data and
persuade system personnel to carry them out. Always investigate what cost
information is routinely collected before beginning an experiment. Don't
expect that procedures will necessarily be changed to suit your needs.
Persuasion, charm, and bribery may be required.
Cost effectiveness and cost benefit are really two distinct concepts. The
former relates the cost of a retrieval system to its effectiveness in serving its
users. Cooper13 has suggested the following measures of cost effectiveness:
Cl = cost/retrieved reference
C2 = cost/relevant reference
C3 = cost/precision
C4 = C2-Cl.
Cost/benefit relates the cost of a system to the overall benefit it provides
within a society or community or institution. Defining social benefit
operationally, rather than simply assessing its importance, is an idea whose
time has not yet come in information retrieval.
It must be emphasized that operationalizations have been cited in this
section purely as examples, not in any sense as the only valid definitions.
Other ways may have equal or greater validity depending on the purpose and
environment of the experiment.
5.4 Decision 4: What database to use?
There are three alternatives here, each with its own advantages and
disadvantages: (1) Build an experimental database; (2) Use an existing
experimental database; (3) Use an operational database.
Building your own database is expensive, so that, unless the investigation
is lavishly funded, it will necessarily be small. There is little evidence that, in
information retrieval, one can extrapolate findings from small databases to
large ones.
The size of an experimental database is a much-debated problem. Test
collections surveyed by Sparck Jones and Van Rusbergen14 ranged in size
from 300 to 50000. However, the larger databases were normally derived
from operational databases and/or used derived (e.g. from title) rather than
assigned indexing. The authors suggest that research needs appear to be for
operationally-derived collections of 30000 documents, with subcollections of
2000 having special properties. Very little is known about the variability of
recall and precision under varying collection size. Tague and Farradane15
showed that the sampling error in estimating system recall and precision
from samples is inversely proportional to the square root of the collection size
(see Section 5.9).
Experimental databases, either self-constructed or obtained from previous
experiments, are almost essential in comparative indexing studies. Only then
is it possible to exercise the necessary control. Many different kinds of
control are needed, among them control of the collection coverage, the form
of surrogate, the characteristics of the indexing. These will be discussed
individually.
I-