IRE
Information Retrieval Experiment
Ineffable concepts in information retrieval
chapter
Nicholas J. Belkin
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
The significance of'ineffable' concepts in information retrieval test]ng 55
ii the former case, the two methods could then be strictly compared or
t[OCRerr]v'tluated against one another, one method definitely being said to be better
t1i[OCRerr]in the other. In the latter case, however, the decision may not be so clear,
or the difference in techniques used may not be as material as the difference
iii basic assumptions.
I Js[OCRerr]r-reIated concepts
Ihis group of concepts is much more obviously likely to affect system
(`valuation directly than that previously discussed. The basic reason is that all
&`vtluation measures, save perhaps effectiveness, depend strictly upon at
Itist one of them, and in a much more obvious way than upon the text-related
tiincepts. Therefore, it is necessary to have a well-defined concept of
ulormation need in order to be able to interpret and use properly the user's
udgements of the system's performance; that is, the user's satisfaction or
(lissatisfaction. The Cranfield experiments, and others, recognized and
ittempted to control for the need problem by eliminating it entirely through
I)ie use of artificial questions (that is, questions without underlying needs).
Ihen the relevance judgements were carried out in an `objective' manner,
tititainted by individual differences among variable users. This strategy is
ti[OCRerr]eful in that it explicitly recognized the difficulty of dealing with individual
tuformation needs. The problem is that there is no a priori reason to suppose
ih[OCRerr]it the performance of a system measured in this way correlates at all well
with performance as evaluated by posers of real questions.
Furthermore, such evaluation techniques tend to assume that the user
iteeds or desires all of the relevant documents. Cooper33 and Oddy34, among
tttllers, argue cogently against this assumption, and it seems that in many
i[OCRerr]es, what the user desires is not all of the potentially relevant documents,
Ittit, say, only one useful one. The concept of utility as an evaluation measure
ii i sense is recognition of the importance of taking account of desire on the
[OCRerr]t,irt of the user. If these user-related factors are ignored, then the evaluation
tileasures which depend upon them, although certainly measuring something,
itity not be measuring anything practically useful.
([OCRerr]onfounded concepts
[OCRerr]`itisfaction (of need, of desire) is of course the basic concept in information
i[OCRerr][OCRerr]trieval system evaluation, and as such cannot be ignored in any test. The
`.iuious concepts of relevance which have been proposed and used testify to
11% importance, and to its intractability. There appear to be two strong
it'[OCRerr]tsons for making sure that the operational definition of satisfaction is
tI(isely related to user judgements. One is, that if user judgement is factored
`it, then the basis for evaluation of system performance may be unrelated to
it'il situations. The other is that it seems clear that actual satisfaction
tIi(igements are order-dependent, and this cannot be dealt with unless one
wurks within a context in which needs are assumed to change with new
`tilormation. This last point is especially difficult to deal with in any testing
t'iivironment, whether one recognizes its importance or not, and appears to
ictluire the development of some quite new experimental paradigms and
[OCRerr]v4'iIuation measures.