IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
220 Retrieval system tests 1958-1978
experiments. An important difference between the tests in the two groups
was that on the whole the tests on manual indexing were evaluation tests of
system performance, while more of the work on automatic indexing was
concerned solely with demonstrating that automatic indexing was feasible
and produced plausible output: this applies to the studies done by O'Connor,
Borko, Stone and Rubinoff, to A. D. Little's NASA study, and indeed to most
of the work reported in the two Stevens volumes. Within the two groups
some tests can be described as experiments proper, involving some degree of
control of variables and explicit comparison, ordinarily between different
indexing languages. The Cranfield and CWRU studies fall into this category.
Others, like Lancaster's Medlars test, were investigations. Some of the
comparative tests, like those of Schuller or Cohen et al., as well as the
investigations of Lancaster, were directly related to operational systems;
others, including virtually all of the work on automatic indexing, were
laboratory studies.
The fact that in most of the work the emphasis was on the indexing
language used gives the research of the decade a distinctive character. Indeed
1968 genuinely marks the end of one phase of research. For manual indexing,
it could be called the Cranfield decade. The relatively uncontrolled
comparisons of Cranfield 1 were followed by the more detailed tests of
Cranfield 2. The experience gained in different Cranfield projects was,
moreover, applied in, e.g. the Herner et al. study of the Bureau of Ships
system, in the Medlars investigation, and the CWRU Comparative Systems
Laboratory work. The CWRU Report of 196814 essentially constitutes an
extended presentation of the testing methods developed in this whole context,
and can be said to summarize the experience gained during the period. At the
same time it was evident by 1968 that automatic indexing raised more
problems than had been expected: the effort and difficulty involved in
conducting well-organized and informative tests was clearly shown by
Dennis' heroic experiments. Salton's book, Automatic Information Organisa-
tion and Retrieval30 nevertheless marked the beginning of a new period since
it emphasized the whole range of novel possibilities for information retrieval
systems made available by computers, and the importance of viewing an
automated information system as an integrated whole. Overall, the conclusion
to be drawn from the work of the decade was expressed in the CWRU
Report: the indexing language used is much less important in determining
system performance than had been supposed.
Given this general characterization of tests between 1958 and 1968, we can
now consider the objective, form and result of the different projects in more
detail. This will involve both the substantive and the methodological
properties of the tests. In the discussion I shall treat the two groups of manual
and automatic indexing tests separately since, as already noticed, they were
very different in character.
Index language tests
We start with the indexing language tests, and first consider them
substantively. As noted, these tests included the classical evaluation studies
carried out at Cranfield and Western Reserve. The Cranfield i[OCRerr] and 22, 3