IRE Information Retrieval Experiment Laboratory tests of manual systems chapter E. Michael Keen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 142 Laboratory tests of manual Systems superiority of the manual approach in precision at medium and loW recall is a more reliable finding (at similar levels of indexing exhaustivity) it must be remembered that term weighting procedures developed more recently by K. Sparck Jones and G. Salton may well have narrowed or removed that gap. The trouble is: we just don't know, and this kind of test comparison is a major challenge to testing ingenuity. Printed index comparisons Tests in this category could be regarded as experiments into index languages, indexing or searching, but they are separated out here because page format indexes seem to have been absent from experiments of these types. Printed indexes have also been late in being tackled by evaluators. This may be due to the relative satisfaction with their apparent performance on the part of their users. Or, it may be the great difficulties that face the tester of such heuristically-flexible page-scanning searching practices. Even in tight laboratory conditions the INSPEC comparison20 admitted that methodolog- ical problems overlaid their test results. The Aberystwyth Off-shelf test6 of published indexes covering library and information science had to face similar problems, and it was not possible to prove that the dissimilar document collections covered by the six indexes had not influenced the results unevenly (see Figure 8.6 later). One useful experiment21'22 conducted in an operational environment compared nine subject catalogues in various formats, including printed. The hazards faced by a deeper laboratory test such as EPSILON14-16 are that the care in control over the construction of the indexes may allow one index to exert undue influence on the others. However this was the only satisfactory way to tackle, once again, the arguments of the 1970s concerning the efficacy of subject indexes constructed by schemes such as chain procedure, PRECIS, articulated, and rotated (KWAC). The foci of the comparisons made were: (1) Full versus no context, as preserved entry systems versus one with lead terms only. (2) Direct versus indirect entry, as multiple entry rotated (e.g. KWAC) versus chain procedure (e.g. British Technology Index). (3) Full versus partial provision of function words, as KWAC or articulated versus rotated term or PRECIS. (4) Active versus predominantly fragmented term order, as KWAC or rotated versus articulated or PRECIS. The findings of these comparisons will be indicated briefly in the list given later. Tackling the printed index page-scanning mode deepened the problem of recording and analysing manual searching, and led in EPSILON to the use of audio-recording, index copy marking and a test technique )involving scanning only selected index portions in order to measure accuracy in entry relevance prediction. This was the first time the criterion of presentation clarity was the subject of an experiment, and it would seem that this criterion appropriate to most information retrieval systems has been missed out of previous work. I I U I i