<DOC> 
<DOCNO> IRE </DOCNO>         
<TITLE> Information Retrieval Experiment </TITLE>         
<SUBTITLE> The methodology of information retrieval experiment </SUBTITLE>         
<TYPE> chapter </TYPE>         
<PAGE CHAPTER="2" NUMBER="10">                   
<AUTHOR1> Stephen E. Robertson </AUTHOR1>  
<PUBLISHER> Butterworth & Company </PUBLISHER> 
<EDITOR1> Karen Sparck Jones </EDITOR1> 
<COPYRIGHT MTH="" DAY="" YEAR="1981" BY="Butterworth & Company">   
All rights reserved.  No part of this publication may be reproduced 
or transmitted in any form or by any means, including photocopying 
and recording, without the written permission of the copyright holder, 
application for which should be addressed to the Publishers.  Such 
written permission must also be obtained before any part of this 
publication is stored in a retrieval system of any nature. 
</COPYRIGHT> 
<BODY> 
10  The methodology of information retrieval experiment

vast majority of experiments have used the scientific paper as the normal
unit.)
Request (or query) has usually been taken to mean the statement by the
requester describing his/her information need, but recently (particularly with
the development of systems such as on-line which allow immediate feedback)
has come to mean simply the act of requesting. It is usually assumed that this
act is stimulated by an underlying need for information, which in some sense
remains invariant, though the requester's perception and/or description of it
may change in the course of his/her interaction with the system.
  User and requester are synonymous. The notion of testing has already been
discussed, as has the distinction between experiment and investigation, in the
editor's introduction.
  A distinction is usually made between systems for current awareness or the
selective dissemination of information (SDI) and those for retrospective
retrieval. In terms of the mechanics of the system, in retrospective retrieval
a request is made to a system as a one-off occurrence, and searched against
the current collection of documents; in SDI, repeated searches are made
against successive additions to the document collection, over a period of time.
  What is the purpose or function of an information retrieval system: what
is it supposed to do? The simple answer to this question is to retrieve
documents in response to requests; but this is too simple, any arbitrary
gadget could do that. The documents must be of a particular kind: that is,
they must serve the user's purpose. Since (we assume) the user's purpose is to
satisfy an information need, we might describe the function of an information
retrieval system as `leading the user to those documents that will best enable
him/her to satisfy his/her need for information'.
  There are many different aspects or properties of a system that one might
want to measure or observe, but most of them are concerned with the
effectiveness of the system, or its benefits, or its efficiency. Effectiveness is how
well the system does what it is supposed to do; its benefits are the gains
deriving from what the system does in some wider context; its efficiency is
how cheaply it does what it does. In this book, we are mainly, but not
exclusively, concerned with the effectiveness or (synonymously) the perfor-
mance of information retrieval systems.

Why test information retrieval systems?
This book is mainly about the `how' of testing. But before we launch into the
technicalities of how best to conduct a test, we should (at least briefly)
consider the prior question of why.
  Starting from the simplest situation, suppose that we have a specified
clientele and document collection, and two existing information retrieval
systems working in parallel, and we wish to decide which of the two to drop.
Then we could imagine conducting a formal experiment to help us make this
particular decision. In principle, such testing would be relatively straightfor-
ward: with a well-defined, specific question to answer, we would have the
ideal experimental situation.
  The problems become more complex if, instead of two alternative systems,
we have one system which we think might be capable of improvement. In
this situation, we might for instance want to evaluate how well it performs

</BODY>                  
</PAGE>                  
</DOC>