IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Approaches to the historical review 215
in retrieval searching is more important than indexing, and so are difficult to
test. In some cases, hypothesis may be of the weakest kind illustrated by any
8tatement that some variable must be important so its behaviour is worth
8tudy the requirement is then to find out why it is important. Statements to
the effect that a certain value of some variable is superior to another value,
or that one value is precisely twice as good as another, are then progressively
stronger hypotheses. Again, a major problem for information retrieval
research in the past two decades has been that of formulating testable
explanatory hypotheses about information system behaviour, and especially
hypotheses given a definite interpretation by a formal model. The distinction
between the explanatory hypotheses of experiment and the descriptive
hypotheses of investigation is not always easy to maintain. Explanatory
hypotheses about the behaviour of a retrieval system ultimately refer to the
way it functions in relation to its purpose, i.e. to its performance. Descriptive
hypotheses are often assumed to have some connection with system function,
but the nature of the connection may be far from clear. Descriptive
hypotheses may indeed be tested, but in such cases they are either implying
explanatory hypotheses or referring to certain system elements simply as
data. Bibliometric and also user studies are examples of descriptive
hypothesis tests. Thus bibliometric studies may be concerned to test
hypotheses about the distribution of citations in journals, or of citation links
between papers, with the test variable the subject area of a literature, for
instance. However in interpreting such tests we have either a presumption
that describing the structure of a literature has some bearing on retrieval
system behaviour, or are in fact concerned with another type of information
system as a phenomenon for study. Information retrieval research over the
past twenty years could perhaps be described as a long and not altogether
successful attempt to convert descriptive hypotheses into explanatory ones.
12.2 Approaches to the historical review
There are thus various ways in which the experimental work of the past two
decades can be treated. One possibility is a straightforward historical
account; another is a review focusing on the development of methods of
experiment (or lack of it); and yet another is a characterization of the research
in terms of the attempt to generate theories and models motivating
experiments. The last two taken together would indicate the quality of
experimental work in information retrieval. There are, however, further
possibilities. One is to survey the experiments done by topic, i.e. to consider
what particular questions within the whole range of questions that could be
asked about document retrieval systems have attracted most attention, or
produced the most significant results. The other possibility is to consider the
experiments done in terms of their influence, actual or potential, on
operational systems. There are in fact no very clear patterns to be seen, since
experiments important on one count may not be so on others: for example we
can have a methodologically sound experiment concerned with an unimpor-
tant question, or a good experiment without influence. There are, however,
some major studies of importance for more than one reason, like Cranfield 1
and 21-3, or Salton's Medlars test4; and though the overall pattern is not very
clear, the general colour of the cloth is plain, and there are some differently