IRE Information Retrieval Experiment Laboratory tests of manual systems chapter E. Michael Keen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 148 Laboratory tests of manual Systems fears about the variable of searching were unfounded. Even the feeling that for round three each system should have taken it in turn to be the `starting' system is not borne out as Alphabetical played this role. These plots also represent a replicate search test, and to this writer suggest that though the superiority of Uniterm may not offen have been statistically significant it was a reliable result that can be taken more seriously than it was. The main options open to the tester to regulate searching fall into the following four areas: (1) Procedural instructions, e.g. what exactly is to be recorded and by what method; are entries to be screened for relevance or accepted en bloc; etc. (2) Rules for choice of terms, their combinations and the order for subsearches, e.g. fixed strategies used on all systems, with or without retrospective adjustment; free strategies devised by the searcher at the time; etc. (3) Specification of any particular search target, e.g. high or low recall; a prescribed amount of documents of given relevance; actual identified documents to be sought; etc. (4) Rules for search termination, e.g. whether determined by reaching specified target; regulated by the rules for term choice and combinations; use of a time cutoff or limit; searchers perception that search should stop; etc. Many methods and combinations have been used since Cranfield. The most difficult cases to control are those using free strategies in heuristic systems, such as printed indexes. In EPSILON the choice for the main tests was full recording of everything except irrelevant entries, entries screened for relevance during the search, free strategies, no particular target specified, use of time as an upper limit and specified reasons for search termination. This crucial area of manual testing can only be refined and adjusted by the hard- slog of trial and error: there are no neat answers awaiting discovery. The search recording problem Printed index searching is probably more difficult to record even than online interaction, since in the latter the resulting printout or search log can capture most of the process unobtrusively as done in Medusa1 7. Methods that are disturbing to the searcher are probably unavoidable, corresponding to the Heisenberg principle, and awareness of observation may cause a kind of Hawthorne effect. Some of the different methods for recording freely-devised strategies are: (1) Searcher compiled record sheets, varying in detail. (2) Searcher conducted marking of the index copy. (3) Searcher verbalizing using audio recording. (4) Observation by a second party, either person or camera. (5) Searcher retracing progress in a post-search interview. The progress of a search against time is offen needed, requiring a stop-clock or timing device, or its derivation from audio or camera records. Torr, Fried and Prevel29 concluded that a time-lapse camera was best for field testing of printed indexes, but apart from having to sit under a strongly lighted box I I I I 1'