IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
148 Laboratory tests of manual Systems
fears about the variable of searching were unfounded. Even the feeling that
for round three each system should have taken it in turn to be the `starting'
system is not borne out as Alphabetical played this role. These plots also
represent a replicate search test, and to this writer suggest that though the
superiority of Uniterm may not offen have been statistically significant it was
a reliable result that can be taken more seriously than it was.
The main options open to the tester to regulate searching fall into the
following four areas:
(1) Procedural instructions, e.g. what exactly is to be recorded and by what
method; are entries to be screened for relevance or accepted en bloc; etc.
(2) Rules for choice of terms, their combinations and the order for
subsearches, e.g. fixed strategies used on all systems, with or without
retrospective adjustment; free strategies devised by the searcher at the
time; etc.
(3) Specification of any particular search target, e.g. high or low recall; a
prescribed amount of documents of given relevance; actual identified
documents to be sought; etc.
(4) Rules for search termination, e.g. whether determined by reaching
specified target; regulated by the rules for term choice and combinations;
use of a time cutoff or limit; searchers perception that search should stop;
etc.
Many methods and combinations have been used since Cranfield. The most
difficult cases to control are those using free strategies in heuristic systems,
such as printed indexes. In EPSILON the choice for the main tests was full
recording of everything except irrelevant entries, entries screened for
relevance during the search, free strategies, no particular target specified, use
of time as an upper limit and specified reasons for search termination. This
crucial area of manual testing can only be refined and adjusted by the hard-
slog of trial and error: there are no neat answers awaiting discovery.
The search recording problem
Printed index searching is probably more difficult to record even than online
interaction, since in the latter the resulting printout or search log can capture
most of the process unobtrusively as done in Medusa1 7. Methods that are
disturbing to the searcher are probably unavoidable, corresponding to the
Heisenberg principle, and awareness of observation may cause a kind of
Hawthorne effect. Some of the different methods for recording freely-devised
strategies are:
(1) Searcher compiled record sheets, varying in detail.
(2) Searcher conducted marking of the index copy.
(3) Searcher verbalizing using audio recording.
(4) Observation by a second party, either person or camera.
(5) Searcher retracing progress in a post-search interview.
The progress of a search against time is offen needed, requiring a stop-clock
or timing device, or its derivation from audio or camera records. Torr, Fried
and Prevel29 concluded that a time-lapse camera was best for field testing of
printed indexes, but apart from having to sit under a strongly lighted box
I
I
I
I
1'