IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 3: How to operationalize the variables? 69
i.e. n = 44 237.
A detailed analysis of the percentage of a pool of documents which must
be assessed in order to test statistically for a difference between two
methods at specified significance and power levels is given in Gilbert and
Sparck Jones' 2 A second method gives the actual numbers of documents.
(4) In comparative tests, instead of calculating absolute recall calculate
relative recall. This is defined as follows:
Let A[OCRerr], i 1 ,m be the set of relevant documents retrieved by the ith
treatment or level of the variable. Then the relative recall of the ith
system is defined by
IA[OCRerr]
IL[OCRerr]=, 4
where IXI represents the number of elements in the set X. So the recall of
the ith treatment or level becomes the proportion of relevant documents
retrieved by any system which are retrieved by the ith treatment or level.
Relative recall seems appropriate in comparative testing, though it
obviously cannot be used to compare results from one experiment or database
to another. The values are heavily dependent on the particular treatments
under consideration. It is virtually the only possible approach to recall in
testing large operational systems.
Evaluation also considers variables relating to efficiency[OCRerr]time, cost,
cost/benefit, cost/effectiveness. Although times such as searching time or
document delivery time or total response time (i.e. time between the first and
final contact of the user with a system) present no conceptual difficulties, in
practice, with operational systems, the values are difficult to collect.
Computer systems usually provide information about connect time (i.e.
elapsed time) and CPU time (time the computer was actually processing
data). Problems may arise with computer down time. Frequently, when the
system crashes, no record will remain of time already spent on the system (or
money either, which may be an economic advantage but an experimental
problem). If system crashes are frequent with online systems, searchers are
advised to keep their own time records as well.
Paralleling computer crashes is the problem of interruptions in manual
searching. If these are more than remote possibilities, then each searcher
rather than a single time keeper should keep time records.
Costing a retrieval system, overall and for individual searches, is not a
trivial undertaking. Such costs must include:
Personnel time professional and clerical
Communication time
Equipment costs, suitably amortized
Supplies
Document reproduction
System overhead[OCRerr]rent, utilities, taxes, etc.
among other items. Sometimes the cost to the user of the time he or she
spends interacting with the system is also included.
Obtaining cost data requires meticulous record-keeping by the staff. This
is not always an accepted practice in operational systems, and the