IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
The conclusions of information retrieval testing 143
8.3 The COflClUSiOflS of information retrieval testing
has all this testing activity led to a set of general conclusions about the design
and operation of information retrieval systems, especially manual ones? It is
perhaps no surprise that even the answer to such a question is a matter for
opinion and debate. Both the content and the status of general findings are
viewed in various ways.
Laws, rules or principles?
If information retrieval is a behavioural science it is unlikely that inviolate
laws await discovery. Researchers have therefore more wisely talked about
there being hypotheses, rules or fundamental principles. Cyril Cleverdon23
specified 13 hypotheses arising from Cranfield 1; Gerard Salton set out a set
of rules governing automatic text analysis24; and Michael Keen and Jeremy
Digger gave ten findings in the form of principles9. Cyril Cleverdon has
referred to three principles he regards as fundamental25, which may be
stated in the present writer's terms as follows:
(I) As a search proceeds and retrieves an increasingly larger number of
documents, so the numbers of relevant and irrelevant documents
retrieved increase monotonically, as also do the measures of recall and
fallout. Because the precision ratio is related to both these measures,
there is a high probability that there will be an inverse relationship
between recall and precision.
(2) If indexing exhaustivity is increased, so will the recall ceiling. For a given
desired level of recall there is an optimum level of indexing exhaustivity:
below this level recall will suffer, and above it precision will deteriorate.
However, the optimum level may have a quite wide range of acceptable
values26.
(3) If indexing specificity is increased, the precision ratio rises. Specificity
may be adjusted either by the semantic specificity of the index terms or
the levels of term combination usable in searching. For a given desired
level of precision there is an optimum level of specificity, though the
range of values is not well understood.
The first of these three principles incorporates some of the important
qualifications that safeguard a naive view of the recall/precision trade-off, as
spelled out by Cyril Cleverdon27, but misunderstandings and disagreement
break out from time to time. The writer's view of the more detailed practical
findings of manual laboratory tests adds the following ten matters:
(4) Different types of classificatory index language do not substantially
differ in performance merit (Cranfield 1).
(5) Controlled index languages, such as classification, alphabetical headings
and multiple entry systems (e.g. Uniterm, thesauri, etc.) differ little in
performance (Cranfield 1, Cranfield 2, ISILT, Off-shelf).
(6) Index languages uncontrolled at the indexing stage do not have an
inferior performance to controlled ones (Cranfield 2, ISILT).
(7) Extensive cross-references are not needed for high recall, and there is an
optimum level above which precision suffers (Cranfield 1, Cranfield 2,
ISILT).
(8) Syntactical devices used explicitly in searching (e.g. links, roles,