IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
146 Laboratory tests of manual Systems
judgements may have been made on its basis by another person. The
benchmarks for relevance and searching cannot be allowed to vary.
Searchers for laboratory tests have usually to be specially recruited and
paid, and for practical reasons students are often used. Some knowledge of
the subject area of the test collection and requests is a usual requirement.
Less usually used are subject experts of long standing, or librarians/
intermediaries having little formally acquired subject expertise. Several
search sessions may have to be held, and all the usual care is needed to create
near identical experimental circumstances.
Comparability of search strategies
All types of comparative test need to so regulate and control the search
strategies used that no unwanted variation in performance will bias the
results. In Swanson's test no explanation was given of the practice followed
to make searches comparable, thus it is difficult to judge what the result
represents, though clearly very little control was imposed. By contrast the
INSP[OCRerr]C printed indexes comparison adopted virtual total control over every
aspect of searching by requiring a flowchart to be rigorously followed. This
led to problems, however, with the flowchart suiting one of the indexes better
than the others, so failing to achieve the intended neutrality. Another
problem was that since such artificial search procedures were used realistic
times could not be obtained and the attempt to use standard times was not
successful. Another test using fixed strategy methods was that of Case
Western Reserve University, where searches designed for use on reasonably
exhaustive indexing were applied also to titles, thus failing to match any
documents at all with large numbers of the queries and giving very low recall
indeed. In fact the searchers sometimes compensated for title searching by
dropping terms from the formulation, but these were spotted as contrary to
the rules. This result was therefore no realistic test of title searching at all.
Cranfield 1 saw clearly the search strategy problem and tackled it in
several ways. The first round of testing encountered the problem of how long
a searcher was justified in continuing a search when the one relevant source
document was known to be in the file somewhere. Also, since scoring was to[OCRerr]
include the number of subsearches required to find the [OCRerr]urce document, the
problem was to decide exactly what constituted a different subsearch,
particularly the different kinds of entries in the four index languages. S[OCRerr]
search round two prescribed a limit beyond which the search strategy could
not be broadened, and also defined a subsearch both in general and in terms
of each system in as fair a manner as possible. Although this gave results that
were taken to be quite satisfactory for the main variables under test, an
analysis of system failures revealed cases where a search on one system had
succeeded but the same query on another system had failed due to search
formulation. Also, it was said `it appeared offen a matter of chance whether
the correct programme (subsearch) was used on the first or fifth searches'. So
the third round of searching was designed to `eliminate as far as possible the
variable of searching'2 by adopting a standardized and fixed strategy for all
four systems. This was done by making an initial free-mode search always on
one of the systems as the yardstick, then applying the strategy in an
appropriate identical manner to the other three. If one of these later systems
I
I
j
4
I
i
I
I
I