IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
216 Retrieval system tests l958[OCRerr]1978
coloured threads to be traced, some of them even standing out as brightly
coloured against the overall grey brown.
In my view the questions experiments seek to answer should be viewed as
bearing on the quality of experiments. Thus we can evaluate experiments in
terms of method, hypothesis, and research or application status, and also
information retrieval system concern: some aspects of document retrieval
systems are more central and important than others, for example searching
and matching as opposed to the quality of abstracts used as a basis for
indexing, or the convenience of the online searcher's terminal. The core of an
information retrieval system is the document access information, i.e. the
character of the indexing data and search mechanisms available. The
character of the users, of the literature, of the physical and administrative
plant, and so on, represent progressively more peripheral environments of
the indexing and search functions. We may therefore, other things being
equal, rate studies concerned with the core of an information retrieval system
as more important than those directed at the periphery.
The influence of the experiments which have been carried out can, on the
other hand, be dealt with by an historical account. A chronicle version of
retrieval experiment does not match the logical characterization just
described particularly well, so an historical account of testing is required to
balance an evaluative one. The choice and sequence of experiments has
naturally been influenced by the challenges posed by the findings of specific
tests, but it has also been affected by developments in operational systems
and in broader changes in attitudes to information system provision.
The remainder of the chapter will therefore be organized as follows. I shall
first provide a summary view of the history of information retrieval
experiment in its wider context, mentioning noteworthy tests in passing. I
shall then consider these and other representative experiments from an
evaluative point of view, in relation to their objectives, i.e. their focus1
motivation, and underlying assumptions; in relation to their forms, i.e.
broadly speaking their data and conduct, which can be itemized utilizing
Bourne's useful scheme5 as covering
(1) corpus size (requests and documents) and subject,
(2) source of the requests,
(3) degree of request negotiation with the user,
(4) number of relevance levels (excluding non-relevance),
(5) status of the relevance judges and basis of their judgements,
(6) performance measures;
and in relation to their results, i.e. their findings, the interpretation given to
these findings, and their implications. This evaluation will be primarily
retrospective, but some reference to what the experiments looked like at the
time may be appropriate. Overall this survey will seek to show whether and
how experiments have changed in their objective or type of objective, their
form, and their results, and more particularly if any changes reflect a growth
of experience in the conduct of information retrieval tests and in the
understanding of retrieval systems. Following the detailed discussion I shall
summarize the main features of the test work done, viewed as a whole. It
turns out that the research of the period covered by the chapter can he
naturally divided into that of the decade 1958-1968, and that of the decade