Information Retrieval Experiment

IRE Information Retrieval Experiment Introduction chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 4 Introduction retrieval Systems or their individual components. A good deal has been written about evaluation: what aspects of system performance, under the general headings of effectiveness and efficiency, should be evaluated and what form the evaluation should take. Evaluation implies experiment, or at least observation. But what to do and how to do it are distinct. For example, there is a difference between an interest in studying a method of indexing, evaluating it via recall, and actually carrying out the study by having certain documents indexed by certain people and searched in a certain way to satisfy certain needs. The general assumption tends to be that if you know what you want to evaluate, with given evaluation criteria, the appropriate experiment is obvious. Experience shows that this is not the case, because the characteristics of retrieval systems are so difficult to determine and their implication for experiment so difficult to identify Much has nevertheless been learnt in the last 20 years about the conduct of retrieval system tests; and this book attempts to bring the varied experience of its contributors to bear on all aspects of retrieval experiment. Part 1 deals with general topics applicable to all experiments and investigation: methodological issues, the relation between testing and evaluation, the problems presented by inaccessible system factors like meaning, and the proper provision of data. Part 2 covers different types of test: real-life tests and laboratory tests, with a separate treatment of manual and automatic systems in each case; simulation tests, and gedanken experiments. Though the different types of test have much in common, they present distinct problems and demand different approaches in testing. In Part 3, specific retrieval tests are used to illustrate and amplify the points made in the previous sections. Chapter 12 considers the test history of the last 20 years through major and representative tests; Chapter 13 examines Cranfield 1 and 2; Chapter 14 analyses a specific experiment in detail, and Chapter 15 the long-term Smart Project. In the book as a whole, the emphasis is on a detailed treatment of the issues involved in information retrieval testing, and especially on the less obvious problems occurring in the design and conduct of tests. Throughout, examples are used to illustrate the points made. It will be evident that the theme of the book is a large one, with many facets. The chapters moreover embody individual views of their specific topics. For this reason, it is appropriate to conclude this Introduction with the interpretations of the key terms used which underlie all the chapters. These terms have been used in these senses so far, but to lead into Part 1 they need an explicit rather than implicit characterization. However, since the characterization is filled out by the book as a whole, all that is required here is a summary indication of the use of the key terms informing the different chapters. Thus as this book is about information retrieval experiment, we need to say what is meant by `experiment', and how it is related to `investigation'. An experiment is designed to answer the question `What happens if you do X?', or `What happens if you do X rather than Y?'. An investigation is designed to answer the question `What happens in System S?', or `What happens, in general?', or, more tentatively, `What might be happening in System S or in general?'; most modestly, an investigation is designed to answer the question `What data can we get which may tell us what might be happening?'