IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Decision 3: How to operationalize the variables? 69 i.e. n = 44 237. A detailed analysis of the percentage of a pool of documents which must be assessed in order to test statistically for a difference between two methods at specified significance and power levels is given in Gilbert and Sparck Jones' 2 A second method gives the actual numbers of documents. (4) In comparative tests, instead of calculating absolute recall calculate relative recall. This is defined as follows: Let A[OCRerr], i 1 ,m be the set of relevant documents retrieved by the ith treatment or level of the variable. Then the relative recall of the ith system is defined by IA[OCRerr] IL[OCRerr]=, 4 where IXI represents the number of elements in the set X. So the recall of the ith treatment or level becomes the proportion of relevant documents retrieved by any system which are retrieved by the ith treatment or level. Relative recall seems appropriate in comparative testing, though it obviously cannot be used to compare results from one experiment or database to another. The values are heavily dependent on the particular treatments under consideration. It is virtually the only possible approach to recall in testing large operational systems. Evaluation also considers variables relating to efficiency[OCRerr]time, cost, cost/benefit, cost/effectiveness. Although times such as searching time or document delivery time or total response time (i.e. time between the first and final contact of the user with a system) present no conceptual difficulties, in practice, with operational systems, the values are difficult to collect. Computer systems usually provide information about connect time (i.e. elapsed time) and CPU time (time the computer was actually processing data). Problems may arise with computer down time. Frequently, when the system crashes, no record will remain of time already spent on the system (or money either, which may be an economic advantage but an experimental problem). If system crashes are frequent with online systems, searchers are advised to keep their own time records as well. Paralleling computer crashes is the problem of interruptions in manual searching. If these are more than remote possibilities, then each searcher rather than a single time keeper should keep time records. Costing a retrieval system, overall and for individual searches, is not a trivial undertaking. Such costs must include: Personnel time professional and clerical Communication time Equipment costs, suitably amortized Supplies Document reproduction System overhead[OCRerr]rent, utilities, taxes, etc. among other items. Sometimes the cost to the user of the time he or she spends interacting with the system is also included. Obtaining cost data requires meticulous record-keeping by the staff. This is not always an accepted practice in operational systems, and the