IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Test types 139
conditions can provide much valuable data even in the realm of retrieval
failure analysis. They also rode the storm caused by the unexpected and
unwelcome outcome of the comparison, and made people face the possibility
that complexity and intelligence at input may not result in a superior result
at retrieval.
Though Cranfield 2 used machine-like search procedures in testing 29
index language devices3'4, we may say that the findings about the
effectiveness of natural language (either indexing or titles) apply to manual
Systems as well, unless practical considerations of file or vocabulary size
inhibit. The 63 requests and 200 documents subset of the Cranfield 2
aerodynamics collection have probably become the most heavily used test
collection. The ISILT experiments9' 10, as Cranfield 1, took the untested
debates of the day (minimum vocabulary post-coordinate systems for
example) and once again tried to provide measured results to replace
unmeasured opinion.
Two large index language comparisons that utilized manual indexing and
search formulation, but machine searching, were the Case-Western Reserve
University test" and Tom Aitchison's INSPEC work'2. Many small scale
tests were carried out on the need for syntactic devices (e.g. links and roles)
in index languages, and these culminated in tests of the relational indexing
system carried out by Jason Farradane'3 and in ISILT.
Indexing and searching experinients
Index language testing has dominated the main thrust of laboratory
investigations in spite of the evidence of Cranfield 1 that it is the operations
of indexing and searching that matter most. No large-scale laboratory
experiments have tackled these two processes as primary variables, though
many tests have experimented with them as secondary variables: all the large
index language tests mentioned did so. Cranfield 1 is a classic in this respect.
The 18 000 documents, which were indexed by four languages, were built up
from batches of carefully selected components. There were different types of
document (articles, reports, book sections, etc.), general or specialist subject
areas, five time limits allowed for indexing, and individual performance in
indexing and searching was related to level of experience and the use of
subject specialists versus librarians.
Clearly defined parameters of exhaustivity and specificity as they affect
both indexing and searching were explored in Cranfield 2 and ISILT. The
comparison between pre- and post-coordinate search files was systematically
tackled in ISILT, and the phenomenon of'preserving the context' by multiple
specific pre-coordinate entry was carried over from ISILT to the printed
index experiments known as EPSILON'4-'6. There have also been numbers
of smaller projects in which just one of the two processes has been studied,
but most such studies on indexing quality or consistency have not reached the
status of valid evaluation testing.
Turning to tests of the search process, many laboratory experiments have
employed very strict controls. It is true that in operational tests manual
search formulation and strategy can vary dramatically from person to person,
as was clearly seen in the Medusa current awareness work'7. One
experimental method is to obtain results by progressively broadening the