IRE
Information Retrieval Experiment
Laboratory tests: automatic systems
chapter
Robert N. Oddy
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
168 Laboratory tests: automatic Systems
I
I
cannot easily be mirrored in laboratory Systems, but which may have
considerable impact on perceived performance. I should like to point out
that these problems fall into two categories, although the categories are
subtly interrelated and I shall be forced to discuss them together: some relate
to what may be called parameters (environmental factors, system design
features, charging algorithms, for instance) and their effect upon the retrieval
effectiveness obtainable, and others relate to the goals of the user and the
system and how effectiveness should be measured. The debate on what
comprises the effectiveness of an information retrieval system is long and
involved. Notable contributions have been made by Cleverdon14' 37,
Lancaster38, Cooper39 and a number of others. Van Rusbergen12 restricts
the term `effectiveness' to refer to `the ability of the system to retrieve relevant
documents while at the same time holding back non-relevant ones' (p.145),
and it is this type of effectiveness, and only this type, that is measured by very
nearly all laboratory tests of automatic systems (one exception is a test by
Oddy33 of a browsing mechanism in which measurements related to user
effort were made). Relevance-based effectiveness measures are also used inter
a/ia in real life experiments. Now, in order to establish a fruitful relationship
between the laboratory tests and their hypothetical real life analogues, we
must ask two questions:
(1) Is relevance-based effectiveness safely separable from other performance
characteristics for experimental purposes?
(2) Is relevance in real life the same as relevance in laboratory tests?
Aspects of performance which may be regarded as important by users
include the effort that they must expend, the response speed of the system,
and the cost-effectiveness'4' 40, 4'. If a system is poor in any of these respects
then, clearly, its achievements in the recallI precision domain may simply not
be appreciated by the users. However, I think the connection between the
different components of performance is deeper than that. System parameters
such as the types and powers of storage devices, computer processors, and
communication equipment, the complexity of algorithms, the ergonomics of
terminal design, and the user interface facilities42 are all factors which
strongly influence performance and which are not usually investigated in
information retrieval tests. The assumption made is that the relevance of a
document to a query does not depend on such aspects of performance.
Relevance in tests is a simple abstract entity, a relation between queries and
documents: any links between its real life correlate and characteristics like
user effort and response time are disregarded. Of course, such links do exist
and they are complex, and have yet to be investigated properly. They arise
out of the cognitive activity of the user during the searching process. The user
will normally be trying to fulfil some purpose, which will determine the use
he makes of the system's output. His progress towards his objective, and thus
his attitudes towards the search output will vary as the search itself proceeds
(of which, more will be said presently). Therefore, we must expect every
apparent aspect of system behaviour to have some influence on relevance-
based effectiveness measurements. I am aware of no experiment which
attempts to quantify any of this class of effects, although the effects are
widely acknowledged43, so I am unable to answer question (1), above.
I have said that in laboratory tests, simple abstractions of the phenomenon
q
I
il
j