IRE Information Retrieval Experiment The methodology of information retrieval experiment chapter Stephen E. Robertson Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Components of the archetype 15 Measurement: costs and times If the object of a test is to determine something about cost-effectiveness or cost-benefit, then clearly we must measure costs or some related factor. Generally speaking, one is not concerned with the overall costs of the entire system, but with the costs of certain specific parts. Thus an operational sys- tem manager might want to know what happens (to both costs and per- formance) if a certain part of the system is changed. I argued above that for effectiveness, one must treat the entire system as a whole. For costs, generally, the opposite is true: that is, since costs are in a strict sense additive, it is easiest and most sensible to cost only those parts of the system that may change. This is not the place for an extensive discussion of how to go about costing a system or parts of it. It may be helpful, however, to note the almost universal use of the equation cost = time. If the difference in cost between two systems depends only on a difference in the time spent on one particular operation (say human time on indexing or machine time on searching), then one can do the appropriate cost-effectiveness comparison without ever bringing in explicit costs, simply regarding the time spent (by human or machine) as equivalent to cost. This avoids many accounting problems, and is normally the only method of including costs that is open to the laboratory researcher. Measurement: coverage and currency One group of variables that may be measured in connection with a particular information service consists of those which relate to the collection of documents, or to the systems for selection and acquisition, rather than to the system which retrieves from the collection. This group includes such variables as coverage and obsolescence. A considerable amount of attention has been devoted to these variables in the information science literature, under the general heading of bibliometrics. This work is, by and large, outside the scope of this book. However, one specific connection should be made. One of the properties of a retrieval system which one might want to find out from an experiment is recall, or the proportion of the relevant documents in the collection that are retrieved. If coverage (for a particular user) is defined as the proportion of the relevant documents in the universe that are included in the collection, then it is clear that coverage (of the collection) and recall (of the system) together determine how many relevant documents the user sees, given how many there are in the universe. In other words, collection properties and retrieval system properties interact. A second area of interaction concerns currency. In an SDI service, for example, the delay between a document being published and a user becoming aware of it is determined both by the selection and acquisition system and by the indexing and retrieval system. As these examples indicate, in the final analysis the properties of a retrieval system should not be considered in isolation from other aspects of the information service of which it is part. Nleasurenient: explanatory variables One may be concerned, especially in laboratory tests, with variables which might explain or predict the final performance of the system. These variables