IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
The decade 1968-1978 233
from the point of view of test evaluation usefully, organized by the five
specific topics.
The work of 1968-1978 is more variegated than that of the previous
decade, and it is less easy to describe in a tidy way. Concrete comparisons
between tests are more difficult to make, and comprehensive generalizations
about groups of tests cannot always be provided. In some groups no tests
stand out as especially important; however for the decade as a whole we can
single out the tests or sets of tests done by Aitchison et al.49, Keen51' 52,
Vaswani and Cameron78, Miller68-70, UKCIS53' 54, 71-73, and perhaps
Sparck Jones75-77' 79, 80, 93-95 as significant in terms of scope, conduct or
result. Some tests, like Aitchison et al.'s and Keen's, resembled Cranfield 2
in touching on a wide range of questions. The Smart Project work as a whole
is very important4' 63-66, 96-98
Turning now to individual tests, or more particularly experiments, both
important and representative, the question is what changes and developments
are detectable in their objectives, forms, and results. As in the discussion of
the work of the decade 1958-1968, the tests will be considered first from the
substantive, and then from the methodological points of view; but in this
case all five groups will be treated substantively before methodological
questions are considered.
Index language tests
The tests in the first group were focused on comparisons between different
indexing languages. This group is exemplified by Jahoda and Stursa's test99,
Cleverdon's three tests87' 100, 101, and those of Aitchison et al.49, Barker et
al.53' [OCRerr] Olive et al.50, and Keen51' 52, Jahoda and Stursa compared single
subject access with a KWIC index, Cleverdon controlled thesaurus-type
languages with natural language, Keen's ISILT several controlled languages
and natural language, Aitchison et al. and Barker et al. chiefly different
natural language texts like titles and abstracts. Smart Project experiments
included manual controlled versus automatic natural language comparisons
in the Medlars tests4, and Miller, in working on searching, tested controlled
MeSH versus natural language68' 69; Evans compared manually and
automatically assigned thesaurus terms86, and Klingbiel and Rinker manual
and semi-automatic natural language indexing85. Keen's printed subject
investigation, EPSILON, can also be regarded, though the emphasis is on
searching, as partly a language test60.
The most conspicuous feature of these tests is the inclusion of natural
language; index language tests in the previous decade were typically confined
to different forms of controlled language. The inclusion of natural language,
represented either by manually-selected keywords or by automatically-
searchable titles or abstracts, must be seen as responding in part to the
findings of earlier projects like Cranfield 2 (this was indeed explicitly the case
in, for example, Aitchison et al.'s test), and in part to the increasing use of
machine files for which title searching in particular is especially appropriate.
The cost of using a controlled language with very large files, whether for
indexing or searching, must be a contributing factor too. Some of the tests,
like Cleverdon's DOAE test100, and Keen's, explicitly covered dependent
variables like indexing exhaustivity, and Aitchison et al. included question
1.