IRE Information Retrieval Experiment Laboratory tests of manual systems chapter E. Michael Keen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 146 Laboratory tests of manual Systems judgements may have been made on its basis by another person. The benchmarks for relevance and searching cannot be allowed to vary. Searchers for laboratory tests have usually to be specially recruited and paid, and for practical reasons students are often used. Some knowledge of the subject area of the test collection and requests is a usual requirement. Less usually used are subject experts of long standing, or librarians/ intermediaries having little formally acquired subject expertise. Several search sessions may have to be held, and all the usual care is needed to create near identical experimental circumstances. Comparability of search strategies All types of comparative test need to so regulate and control the search strategies used that no unwanted variation in performance will bias the results. In Swanson's test no explanation was given of the practice followed to make searches comparable, thus it is difficult to judge what the result represents, though clearly very little control was imposed. By contrast the INSP[OCRerr]C printed indexes comparison adopted virtual total control over every aspect of searching by requiring a flowchart to be rigorously followed. This led to problems, however, with the flowchart suiting one of the indexes better than the others, so failing to achieve the intended neutrality. Another problem was that since such artificial search procedures were used realistic times could not be obtained and the attempt to use standard times was not successful. Another test using fixed strategy methods was that of Case Western Reserve University, where searches designed for use on reasonably exhaustive indexing were applied also to titles, thus failing to match any documents at all with large numbers of the queries and giving very low recall indeed. In fact the searchers sometimes compensated for title searching by dropping terms from the formulation, but these were spotted as contrary to the rules. This result was therefore no realistic test of title searching at all. Cranfield 1 saw clearly the search strategy problem and tackled it in several ways. The first round of testing encountered the problem of how long a searcher was justified in continuing a search when the one relevant source document was known to be in the file somewhere. Also, since scoring was to[OCRerr] include the number of subsearches required to find the [OCRerr]urce document, the problem was to decide exactly what constituted a different subsearch, particularly the different kinds of entries in the four index languages. S[OCRerr] search round two prescribed a limit beyond which the search strategy could not be broadened, and also defined a subsearch both in general and in terms of each system in as fair a manner as possible. Although this gave results that were taken to be quite satisfactory for the main variables under test, an analysis of system failures revealed cases where a search on one system had succeeded but the same query on another system had failed due to search formulation. Also, it was said `it appeared offen a matter of chance whether the correct programme (subsearch) was used on the first or fifth searches'. So the third round of searching was designed to `eliminate as far as possible the variable of searching'2 by adopting a standardized and fixed strategy for all four systems. This was done by making an initial free-mode search always on one of the systems as the yardstick, then applying the strategy in an appropriate identical manner to the other three. If one of these later systems I I j 4 I i I I I