IRE
Information Retrieval Experiment
Introduction
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
4 Introduction
retrieval Systems or their individual components. A good deal has been
written about evaluation: what aspects of system performance, under the
general headings of effectiveness and efficiency, should be evaluated and
what form the evaluation should take. Evaluation implies experiment, or at
least observation. But what to do and how to do it are distinct. For example,
there is a difference between an interest in studying a method of indexing,
evaluating it via recall, and actually carrying out the study by having certain
documents indexed by certain people and searched in a certain way to satisfy
certain needs. The general assumption tends to be that if you know what you
want to evaluate, with given evaluation criteria, the appropriate experiment
is obvious. Experience shows that this is not the case, because the
characteristics of retrieval systems are so difficult to determine and their
implication for experiment so difficult to identify
Much has nevertheless been learnt in the last 20 years about the conduct
of retrieval system tests; and this book attempts to bring the varied
experience of its contributors to bear on all aspects of retrieval experiment.
Part 1 deals with general topics applicable to all experiments and
investigation: methodological issues, the relation between testing and
evaluation, the problems presented by inaccessible system factors like
meaning, and the proper provision of data. Part 2 covers different types of
test: real-life tests and laboratory tests, with a separate treatment of manual
and automatic systems in each case; simulation tests, and gedanken
experiments. Though the different types of test have much in common, they
present distinct problems and demand different approaches in testing. In
Part 3, specific retrieval tests are used to illustrate and amplify the points
made in the previous sections. Chapter 12 considers the test history of the last
20 years through major and representative tests; Chapter 13 examines
Cranfield 1 and 2; Chapter 14 analyses a specific experiment in detail, and
Chapter 15 the long-term Smart Project. In the book as a whole, the emphasis
is on a detailed treatment of the issues involved in information retrieval
testing, and especially on the less obvious problems occurring in the design
and conduct of tests. Throughout, examples are used to illustrate the points
made.
It will be evident that the theme of the book is a large one, with many
facets. The chapters moreover embody individual views of their specific
topics. For this reason, it is appropriate to conclude this Introduction with
the interpretations of the key terms used which underlie all the chapters.
These terms have been used in these senses so far, but to lead into Part 1 they
need an explicit rather than implicit characterization. However, since the
characterization is filled out by the book as a whole, all that is required here
is a summary indication of the use of the key terms informing the different
chapters.
Thus as this book is about information retrieval experiment, we need to
say what is meant by `experiment', and how it is related to `investigation'. An
experiment is designed to answer the question `What happens if you do X?',
or `What happens if you do X rather than Y?'. An investigation is designed
to answer the question `What happens in System S?', or `What happens, in
general?', or, more tentatively, `What might be happening in System S or in
general?'; most modestly, an investigation is designed to answer the question
`What data can we get which may tell us what might be happening?'