IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
140 Laboratory tests of manual Systems
searches so that a path or search curve can be plotted. This has been done not
only by the use of machine-like levels of term matching, but in freely pursued
manual searches by varying the stopping points the different recall needs of
users can be simulated. Thus in ISILT and the printed index tests the
minimum point was the need for a single highly relevant document,
proceeding through intermediate points to the need for all available
documents of any relevance strength.
As a final example of indexing and searching experiments we ask the
question as to whether these operations perform most effectively when done
manually or when they are automated in some way. Remarkably few tests of
this have been done because it is such a very difficult comparison: like
comparing apples and pears. Three attempts are now illustrated.
100
j
L
L 80
II,
U SC
0
c3
0
4C I.
0 AA
2C * eci
0) c2
0
U
CL 0 5 10 15 20 1,
No. of irrelevant documents retrieved
Figure 8[OCRerr]I Retrieval comparison taken from Swanson's test5.
manual subject index, four results; e e automated text
retrieval
Figure 8.1 shows D. R. Swanson's graph5 with four results of manual
searches on a subject index versus a performance curve based on full text
interrogation with a thesaurus. As has been noted there was little control in
this comparison: the machine result is superior. Figure 8.2 reveals an opposite
order of merit for the manual and automated comparison, and here the
indexing exhaustivity was held constant on the Smart system18. So the
automated result of Swanson may be more a reflection of higher exhaustivity
than anything else. Figure 8.3 is a hitherto unpublished result comparing
ISILT manual with K. Sparck Jones automated19. As many of the variables
as possible between the two methods were removed, the two remaining
differences being: in manual the terms were slightly more specific, leading to
a precision gain and a recall ceiling loss, and the definition of a subsearch (to
generate the performance curve) differed a little, with an unknown effect.
Here the manual system is a little better, but not at all recall levels.
i
i
I