IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 70 The pragmatics of information retrieval experimentation experimenter may have to develop procedures for obtaining cost data and persuade system personnel to carry them out. Always investigate what cost information is routinely collected before beginning an experiment. Don't expect that procedures will necessarily be changed to suit your needs. Persuasion, charm, and bribery may be required. Cost effectiveness and cost benefit are really two distinct concepts. The former relates the cost of a retrieval system to its effectiveness in serving its users. Cooper13 has suggested the following measures of cost effectiveness: Cl = cost/retrieved reference C2 = cost/relevant reference C3 = cost/precision C4 = C2-Cl. Cost/benefit relates the cost of a system to the overall benefit it provides within a society or community or institution. Defining social benefit operationally, rather than simply assessing its importance, is an idea whose time has not yet come in information retrieval. It must be emphasized that operationalizations have been cited in this section purely as examples, not in any sense as the only valid definitions. Other ways may have equal or greater validity depending on the purpose and environment of the experiment. 5.4 Decision 4: What database to use? There are three alternatives here, each with its own advantages and disadvantages: (1) Build an experimental database; (2) Use an existing experimental database; (3) Use an operational database. Building your own database is expensive, so that, unless the investigation is lavishly funded, it will necessarily be small. There is little evidence that, in information retrieval, one can extrapolate findings from small databases to large ones. The size of an experimental database is a much-debated problem. Test collections surveyed by Sparck Jones and Van Rusbergen14 ranged in size from 300 to 50000. However, the larger databases were normally derived from operational databases and/or used derived (e.g. from title) rather than assigned indexing. The authors suggest that research needs appear to be for operationally-derived collections of 30000 documents, with subcollections of 2000 having special properties. Very little is known about the variability of recall and precision under varying collection size. Tague and Farradane15 showed that the sampling error in estimating system recall and precision from samples is inversely proportional to the square root of the collection size (see Section 5.9). Experimental databases, either self-constructed or obtained from previous experiments, are almost essential in comparative indexing studies. Only then is it possible to exercise the necessary control. Many different kinds of control are needed, among them control of the collection coverage, the form of surrogate, the characteristics of the indexing. These will be discussed individually. I-