IRE
Information Retrieval Experiment
Laboratory tests: automatic systems
chapter
Robert N. Oddy
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Realism 169
of relevance are adopted for evaluation. Recently, theories for information
retrieval have emerged out of the background of experimental work and they
are founded upon the same abstractions. Robertson and Belkin44 have made
a distinction between two principles for ranking documents in response to a
query: one can rank according to probability of relevance or degree of
relevance. Probabilistic theories assume that relevance is a boolean variable,
that is it can take on one of two values, denoted: relevant and non-relevant.
Systems based upon the use of matching functions or similarity measures,
e.g. co-ordination level and cosine correlation, would appear to be estimating
the degree of relevance, although evaluation is usually done with dichotomous
relevance judgements. Other assumptions typically made about relevance
are that, for any document-query pair, the relevance judgement is
independent of time, and of the other relevance judgements.
The idea of relevance in the context of real information needs is complex
and poorly understood-information retrieval research can be viewed as our
attempt to understand it-and has been the subject of a substantial literature,
to which I refer the reader through Saracevic's excellent review article45. A
document retrieval system user generally makes a series of decisions about
documents. First, he may make a note (perhaps mental) of the existence of
the document; then he may decide to look at the document's contents;
finally, he may decide to make use of those contents in his own work. All of
these decisions can be regarded as relevance judgements, and the outcome of
each obviously depends upon the enquirer's perception of the document, the
purposes of the enquirer, and his existing knowledge. By his perception of
the document, I mean what aspects of its description or content the enquirer
sees (title, abstract, index terms, for example), and in what circumstances he
sees them (online or in a batch printout). The `cognitive view' of perception46
is that perceived objects are interpreted through the knowledge, or world
model, of the perceiver. Online systems are often provided with so-called
browsing facilities, presumably to encourage the interleaving of mechanical
and intellectual effort (recommended by Doyle47, for instance). Unfortu-
nately, the high cost of using today's online services discourages many users
from taking the time to contribute significant intellectual effort during the
search. (Let us hope that this is a temporary situation.) Nevertheless, even
under these circumstances, a user's state of knowledge relating to his purpose
changes during a search. The purpose itself may also undergo change if
Belkin's48 analysis of the information retrieval situation is to be accepted. A
user comes to an information retrieval system because his state of knowledge
is, in some way, anomalous; that is, he has recognized that his mental world
model cannot cope with his problem in hand. It must be assumed that he may
not be able to specify what information is needed to resolve the anomaly. So,
his conceptualization of his purpose in searching the literature is subject to
modification as his knowledge, and thus the anomaly in his knowledge,
changes. The consequence of all this is that relevance is dependent upon
three factors related to the user-perception, purpose and knowledge-
which are causally closely related to each other, and subject to variation in
the course of an interactive search. The picture of relevance decisions that we
are obtaining is very different in nature from the relevance judgements
included in test collections. Thus my answer to question (2) is clearly `No'.
What implication does this argument have for the results of laboratory