IRE Information Retrieval Experiment Gedanken experimentation: An alternative to traditional system testing? chapter William S. Cooper Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. `V Examples 205 little training. Some possible forms which such training might take have been proposed elsewhere9. Fourth, for some purposes it would be desirable to have the capability of testing an indexer's skill in making the requisite guesses; this has also been discussed elsewhere. Finally, it might be objected that since indexers cannot foretell the future they would be unable to make the required probability estimates with any high degree of accuracy, either with gedanken experimentation or without. But it was never claimed they could. It was merely suggested that there is likely to be a tendency for the numbers they come up with under the gedanken approach to be less inaccurate as probability estimates than the numbers they would otherwise come up with. And since in the last analysis an output ranking is always either explicitly or implicitly a ranking by estimated probability, any improvement in the accuracy of estimation is a step forward. Example 2: The indexer as gedanken experimenter, unweighted indexing Next onsider an even simpler retrieval system in which the indexing is unweighted (or `binary'), where the searcher submits a single term as his request, and where the system responds by retrieving for him as an unranked set all documents indexed under the request term. The common subject card catalogue is a system of this sort with minor elaborations. To decide whether or not to assign a term to a document, an indexer indexing documents for use in such a system can make use of what I have elsewhere called the `Odds-Payoff' decision rule9' [OCRerr]O. Three steps are required. First the indexer estimates the odds against satisfaction after the fashion of the mental experiments of the previous example. Second, he performs another thought experiment whose result is a judgement of how many unsatisfactory documents a typical requestor submitting the term under consideration would be willing to examine and discard as the penalty to be paid to obtain the document to be indexed. Finally, he compares these two numbers and assigns the term if and only if the latter exceeds the former. A variant of this procedure involves substituting a standard average value for the figure obtained in the second step, thereby eliminating that step and greatly simplifying the indexing process. The price of the simplification is that variations in degrees of predicted usefulness among the documents are ignored. Example 3: The requestor as gedanken experimenter Retrieval requests containing user-weighted terms have been in use for some time, but the weights are usually regarded vaguely as indicators of `importance' rather than as estimates of probabilities or functions of probabilities. Moreover, the weights are not manipulated by the system as though they had a probabilistic interpretation. Might it be possible to regard the weights as probabilistic estimates of some kind, and reformulate the retrieval rules so that the weights are treated as such and used to compute explicit estimates of the final document probabilities? A crude system using ordinary unweighted document indexing but capable of handling request-term weights probabilistically might be designed somewhat as follows. A request consists as in most ordinary weighted-request I