IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Approaches to the historical review 215 in retrieval searching is more important than indexing, and so are difficult to test. In some cases, hypothesis may be of the weakest kind illustrated by any 8tatement that some variable must be important so its behaviour is worth 8tudy the requirement is then to find out why it is important. Statements to the effect that a certain value of some variable is superior to another value, or that one value is precisely twice as good as another, are then progressively stronger hypotheses. Again, a major problem for information retrieval research in the past two decades has been that of formulating testable explanatory hypotheses about information system behaviour, and especially hypotheses given a definite interpretation by a formal model. The distinction between the explanatory hypotheses of experiment and the descriptive hypotheses of investigation is not always easy to maintain. Explanatory hypotheses about the behaviour of a retrieval system ultimately refer to the way it functions in relation to its purpose, i.e. to its performance. Descriptive hypotheses are often assumed to have some connection with system function, but the nature of the connection may be far from clear. Descriptive hypotheses may indeed be tested, but in such cases they are either implying explanatory hypotheses or referring to certain system elements simply as data. Bibliometric and also user studies are examples of descriptive hypothesis tests. Thus bibliometric studies may be concerned to test hypotheses about the distribution of citations in journals, or of citation links between papers, with the test variable the subject area of a literature, for instance. However in interpreting such tests we have either a presumption that describing the structure of a literature has some bearing on retrieval system behaviour, or are in fact concerned with another type of information system as a phenomenon for study. Information retrieval research over the past twenty years could perhaps be described as a long and not altogether successful attempt to convert descriptive hypotheses into explanatory ones. 12.2 Approaches to the historical review There are thus various ways in which the experimental work of the past two decades can be treated. One possibility is a straightforward historical account; another is a review focusing on the development of methods of experiment (or lack of it); and yet another is a characterization of the research in terms of the attempt to generate theories and models motivating experiments. The last two taken together would indicate the quality of experimental work in information retrieval. There are, however, further possibilities. One is to survey the experiments done by topic, i.e. to consider what particular questions within the whole range of questions that could be asked about document retrieval systems have attracted most attention, or produced the most significant results. The other possibility is to consider the experiments done in terms of their influence, actual or potential, on operational systems. There are in fact no very clear patterns to be seen, since experiments important on one count may not be so on others: for example we can have a methodologically sound experiment concerned with an unimpor- tant question, or a good experiment without influence. There are, however, some major studies of importance for more than one reason, like Cranfield 1 and 21-3, or Salton's Medlars test4; and though the overall pattern is not very clear, the general colour of the cloth is plain, and there are some differently