IRE Information Retrieval Experiment Gedanken experimentation: An alternative to traditional system testing? chapter William S. Cooper Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 206 Gedanken experimentation: An alternative to traditional system testing? Systems of a list of descriptors accompanied by numeric weights and separated by commas as for example IRON 50, MANUFACTURING 20, POLLUTION 5 The positive number following each term is to be interpreted as that term's `probability change factor' or `precision-boosting power'[OCRerr]that is, as the multiplicative factor by which a document's probability of satisfaction is changed by the presence of the term on the document. Suppose for example that in a random draw from the entire collection the probability of obtaining a useful document is 0.001. Then the presence of the weight 50 on the term IRON indicates that the requestor, if he were to learn that the randomly drawn document had been indexed under IRON, would raise his personal estimate of that probability from 0.001 to 0.05. In other words, in a gedanken experiment in which he compares the density of useful documents in the whole collection to their density in the set indexed by IRON, he guesses the latter density to be some fifty times the former. The weights on the other request terms are interpreted independently in similar fashion. The probabilistic computations needed to estimate a document's proba- bility of satisfaction on the basis of such a request are involved and will not be presented here, though we hope to discuss them in a later publication. They require for their input not only the request weights but also some indexing statistics, specifically such data as the number of documents in the collection indexed under IRON, and the number indexed jointly under IRON and MANUFACTURING. An estimate of the total number of useful documents in the collection is needed too, but the final output ordering is not very sensitive to the value supplied for this estimate. An independence assumption of some sort is needed to circumvent the need for data on such higher order interactions as the degree of overlap in the collection among three or more terms. Paul Huizinga of the University of California has proposed that an independence assumption derived from the maximum entropy principle may be appropriate for this purpose11. Example 4: The system designer as gedanken experimenter For systems where the users formulate their own requests without the aid of an information professional as intermediary, it might be unrealistic to hope for meaningful numeric values of the kind required for the query language of the previous example. A more workable system might merely require that the user attach to his request terms not numeric weights but non-numeric symbols indicative of his qualitative judgements of relative likelihood. Here for example is a six level scale of such judgements: Symbol A B C Interpretation Presence of clue would, other things being equal, make it vastly more likely that the document is useful. A clue of the strongest sort. Presence of clue would make document a much more likely candidate. Clue is a typical `good' clue. Presence of clue would make it somewhat more likely that the document would prove useful.