IRE Information Retrieval Experiment Gedanken experimentation: An alternative to traditional system testing? chapter William S. Cooper Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. References 209 I decisions would appear to be of this sort. Design decisions which can be taken in more leisurely fashion and are important enough may in some cases be worth in addition a little data-gathering effort. Experimental evaluations of the relative effectiveness of whole systems (or of aspects of systems tested within whole systems) along with some widely accepted approaches to system design should probably be rethought to see if they cannot be reformulated in an explicitly probabilistic or utility-theoretic way which reduces the need for [OCRerr]11-scale experimentation. Some may view gedanken experimentation with alarm, feeling that it is a retreat from scientific certainty to wild guesswork propped up by an occasional counting exercise. This attitude would be understandable, but I 8uspect it greatly overestimates the reliability and usefulness of classical experimentation in our field, and underestimates the potential value of theory-supported system design and theory-guided thought about its input. 11.6 Acknowledgements I am indebted to M. Buckland, K. Sparck Jones, M. Maron, and P. Wilson for their incisive but constructive critical commentary on an earlier draft of this chapter. References I. SWANSON, D. R. The evidence underlying the Cranfield results, Library Quarterly 35, 1-20 (1965) 2. SWANSON, D. R. Some unexplained aspects of the Cranfield tests of indexing performance factors, Library Quarterly 41, 223-228 (1971) 3. HARTER, 5. p[OCRerr] The Cranfield II relevance assessments: a critical evaluation, Library Quarterly 41, 229-243 (1971) 4. MARON, M. E. and KUHN5, J. L. On relevance, probabilistic indexing, and information retrieval, Journal of the ACM 7, 216-244(1960) 5. BOOK5TEIN, A. and SWANSON, D. R. A decision-theoretic foundation for indexing, Journal of the Amerkan Societyfor Information Science 26, 45-50 (1975) 6. VAN RJJSBERGEN, C. J. Information Retrieval, 2nd edn, Butterworths, London (1979) 7. ROBERTSON, S. E. and SPARCK JONES, K. Relevance weighting of search terms, Journal of the American Societyfor Information Science 27, 129-146 (1976) 8. ROBERTSON, S. E. The probability ranking principle in IR, Journal ofDocumentation 33, 294- 304(1977) 9. cooPER, W. S. Indexing documents by gedanken experimentation, Journal of the American Societyfor Information Science 29, 107-119 (1978) 10. cooPER, W. sand MARON, M. E. Foundations of probabilistic and utility-theoretic indexing, Journal of the ACM 25, 67-80 (1978) II. GOOD, I. J. Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables, Annals ofMathematkal Statistics 34, 911-932 (1963) 12. COoPER, W. S. The potential usefulness of catalog access points other than author, title, and subject, Journal of the American Society for Information Science 21,112-127 (1970)