IRE
Information Retrieval Experiment
Introduction
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Introduction 3
and addressed to similar topics, though with more emphasis on automatic
natural language indexing techniques, and more interest in a wider range of
system factors. Moreover, in other work motivated by the alternative
approach, seeking to base experiments on theory and so explain rather than
simply exhibit system behaviour, the influence of Cranfield is often visible.
Thus while much experimental work is explicitly indebted to Cranfield 2,
much more has been coloured by it. This book is rightly dedicated to Cyril
Cleverdon.
The Cranfield test showed what hard work experimental information
retrieval is: collecting data, conducting searches, computing performance
figures all take substantial time and effort. There is work in the essentials of
experiment: maintaining systematic procedures, imposing controls on
variables; and there is work in the realities of research: varying approaches
in the search for understanding and explanation, and rehashing past studies
for consistency and solidity. Salton (Chapter 15) has noted that the Smart
Project has involved thousands of tests, implying a large use of man and
machine power; yet there are important aspects of retrieval system use and
behaviour which have not been studied in these tests.
Since information is claimed to be of increasing importance, at least in
twentieth century high technology cultures, while information systems are
little understood, there is every justification for information system
investigation and experiment. Existing information systems reflect long and
extensive experience, but it does not follow that such systems cannot be
improved, and that new technologies may not be exploited for wholly new
types of system. It is arguable that our current understanding of information
processing is like that of sixteenth century herbalists: it embodies some
observation and insight, but lacks detailed analysis and supporting theory.
Unfortunately, though many retrieval experiments and investigations
have been carried out in the last 20 years, much of the experience in the
conduct of tests which has been gained is not very accessible. Published
papers tend to be cleaned up accounts of objectives, general methods, and
results. Project reports may be much fuller, but even here it is often extremely
difficult to find out exactly what was done, or why it was done. Though an
improvement in the quality of experiments is detectable, far too many of
those reported are defective, and in many cases defective in recognized ways,
for which remedies are available. As all but the most limited test is a major
enterprise, it is a pity that so much effort should be wasted. The best that can
he said about many reported studies is that even if they are individually
dubious they may collectively point in the same direction. This is perhaps
something, but it is not much. The object of this volume is to make available
the experience in information retrieval testing of its contributors, in the hope
that this will lead to more fruitful and useful testing in the future.
The book is designed to treat information retrieval experiment and
investigation in a comprehensive way, relevant to both pure research and to
operational practice. Pure research naturally leads to experiment, but system
operators may also want to know how their system is working. Both research
workers and practitioners may know what they want to find out, but it may
not be at all obvious how it is to be found out, and in particular what the best
specific testing method[OCRerr] are. This is clearly seen in evaluation tests. Most
tcsts are intended to evaluate the performance of existing or proposed