IRE Information Retrieval Experiment Laboratory tests of manual systems chapter E. Michael Keen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 140 Laboratory tests of manual Systems searches so that a path or search curve can be plotted. This has been done not only by the use of machine-like levels of term matching, but in freely pursued manual searches by varying the stopping points the different recall needs of users can be simulated. Thus in ISILT and the printed index tests the minimum point was the need for a single highly relevant document, proceeding through intermediate points to the need for all available documents of any relevance strength. As a final example of indexing and searching experiments we ask the question as to whether these operations perform most effectively when done manually or when they are automated in some way. Remarkably few tests of this have been done because it is such a very difficult comparison: like comparing apples and pears. Three attempts are now illustrated. 100 j L L 80 II, U SC 0 c3 0 4C I. 0 AA 2C * eci 0) c2 0 U CL 0 5 10 15 20 1, No. of irrelevant documents retrieved Figure 8[OCRerr]I Retrieval comparison taken from Swanson's test5. manual subject index, four results; e e automated text retrieval Figure 8.1 shows D. R. Swanson's graph5 with four results of manual searches on a subject index versus a performance curve based on full text interrogation with a thesaurus. As has been noted there was little control in this comparison: the machine result is superior. Figure 8.2 reveals an opposite order of merit for the manual and automated comparison, and here the indexing exhaustivity was held constant on the Smart system18. So the automated result of Swanson may be more a reflection of higher exhaustivity than anything else. Figure 8.3 is a hitherto unpublished result comparing ISILT manual with K. Sparck Jones automated19. As many of the variables as possible between the two methods were removed, the two remaining differences being: in manual the terms were slightly more specific, leading to a precision gain and a recall ceiling loss, and the definition of a subsearch (to generate the performance curve) differed a little, with an unknown effect. Here the manual system is a little better, but not at all recall levels. i i I