IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
142 Laboratory tests of manual Systems
superiority of the manual approach in precision at medium and loW recall is
a more reliable finding (at similar levels of indexing exhaustivity) it must be
remembered that term weighting procedures developed more recently by K.
Sparck Jones and G. Salton may well have narrowed or removed that gap.
The trouble is: we just don't know, and this kind of test comparison is a
major challenge to testing ingenuity.
Printed index comparisons
Tests in this category could be regarded as experiments into index languages,
indexing or searching, but they are separated out here because page format
indexes seem to have been absent from experiments of these types. Printed
indexes have also been late in being tackled by evaluators. This may be due
to the relative satisfaction with their apparent performance on the part of
their users. Or, it may be the great difficulties that face the tester of such
heuristically-flexible page-scanning searching practices. Even in tight
laboratory conditions the INSPEC comparison20 admitted that methodolog-
ical problems overlaid their test results. The Aberystwyth Off-shelf test6 of
published indexes covering library and information science had to face
similar problems, and it was not possible to prove that the dissimilar
document collections covered by the six indexes had not influenced the
results unevenly (see Figure 8.6 later). One useful experiment21'22 conducted
in an operational environment compared nine subject catalogues in various
formats, including printed.
The hazards faced by a deeper laboratory test such as EPSILON14-16 are
that the care in control over the construction of the indexes may allow one
index to exert undue influence on the others. However this was the only
satisfactory way to tackle, once again, the arguments of the 1970s concerning
the efficacy of subject indexes constructed by schemes such as chain
procedure, PRECIS, articulated, and rotated (KWAC). The foci of the
comparisons made were:
(1) Full versus no context, as preserved entry systems versus one with lead
terms only.
(2) Direct versus indirect entry, as multiple entry rotated (e.g. KWAC)
versus chain procedure (e.g. British Technology Index).
(3) Full versus partial provision of function words, as KWAC or articulated
versus rotated term or PRECIS.
(4) Active versus predominantly fragmented term order, as KWAC or
rotated versus articulated or PRECIS.
The findings of these comparisons will be indicated briefly in the list given
later.
Tackling the printed index page-scanning mode deepened the problem of
recording and analysing manual searching, and led in EPSILON to the use
of audio-recording, index copy marking and a test technique )involving
scanning only selected index portions in order to measure accuracy in entry
relevance prediction. This was the first time the criterion of presentation
clarity was the subject of an experiment, and it would seem that this criterion
appropriate to most information retrieval systems has been missed out of
previous work.
I
I
U
I i