IRE Information Retrieval Experiment The methodology of information retrieval experiment chapter Stephen E. Robertson Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 2 The methodology of information retrieval experiment Stephen E. Rohertson 2.1 Introduction Information retrieval Systems have been the subject of experimental testing for some twenty years now. Like any field in this position, a fair amount of know-how has accumulated about the proper conduct of such investigations. The object of this book is to distil this know-how; the object of this chapter is to set the scene. Thus I will be introducing the basic ideas, sketching in some of the main problem areas, and generally preparing the reader for the more specific or concrete chapters that follow. Van Rijsbergen takes up the question of the evaluation of retrieval effectiveness in Chapter 3; in Chapter 4, Belkin considers information retrieval in a wider context; and in Chapter 5, Tague gets down to the detail of conducting experiments. Definitions What, then, do we intend to convey by this general rubric, the `experimental testing of information retrieval systems'? Information retrieval is generally taken to mean the retrieval of references to documents in response to requests for information (more about documents and requests below). An information retrieval system is a set of rules and procedures, as operated by humans and/or machines, for doing some or all of the following operations: Indexing (or constructing representations of documents); Search formulation (or constructing representations of information needs); Searching (or matching representations of documents against representations of needs); Feedback (or repeating any or all of the above processes, with modifications introduced in response to an assessment of the results of some process); Index language construction (or the generation of rules of representation). Document is (in theory, at least) taken as more-or-less synonymous with text in linguistics[OCRerr]that is, it describes any piece of linguistic (in the widest sense) material that can reasonably be considered as a unit. (In practice, the 9