IRE
Information Retrieval Experiment
The methodology of information retrieval experiment
chapter
Stephen E. Robertson
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
2
The methodology of information retrieval
experiment
Stephen E. Rohertson
2.1 Introduction
Information retrieval Systems have been the subject of experimental testing
for some twenty years now. Like any field in this position, a fair amount of
know-how has accumulated about the proper conduct of such investigations.
The object of this book is to distil this know-how; the object of this chapter
is to set the scene. Thus I will be introducing the basic ideas, sketching in
some of the main problem areas, and generally preparing the reader for the
more specific or concrete chapters that follow.
Van Rijsbergen takes up the question of the evaluation of retrieval
effectiveness in Chapter 3; in Chapter 4, Belkin considers information
retrieval in a wider context; and in Chapter 5, Tague gets down to the detail
of conducting experiments.
Definitions
What, then, do we intend to convey by this general rubric, the `experimental
testing of information retrieval systems'?
Information retrieval is generally taken to mean the retrieval of references to
documents in response to requests for information (more about documents
and requests below). An information retrieval system is a set of rules and
procedures, as operated by humans and/or machines, for doing some or all of
the following operations:
Indexing (or constructing representations of documents);
Search formulation (or constructing representations of information needs);
Searching (or matching representations of documents against representations
of needs);
Feedback (or repeating any or all of the above processes, with modifications
introduced in response to an assessment of the results of some process);
Index language construction (or the generation of rules of representation).
Document is (in theory, at least) taken as more-or-less synonymous with
text in linguistics[OCRerr]that is, it describes any piece of linguistic (in the widest
sense) material that can reasonably be considered as a unit. (In practice, the
9