IRE Information Retrieval Experiment Ineffable concepts in information retrieval chapter Nicholas J. Belkin Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. The significance of'ineffable' concepts in information retrieval test]ng 55 ii the former case, the two methods could then be strictly compared or t[OCRerr]v'tluated against one another, one method definitely being said to be better t1i[OCRerr]in the other. In the latter case, however, the decision may not be so clear, or the difference in techniques used may not be as material as the difference iii basic assumptions. I Js[OCRerr]r-reIated concepts Ihis group of concepts is much more obviously likely to affect system (`valuation directly than that previously discussed. The basic reason is that all &`vtluation measures, save perhaps effectiveness, depend strictly upon at Itist one of them, and in a much more obvious way than upon the text-related tiincepts. Therefore, it is necessary to have a well-defined concept of ulormation need in order to be able to interpret and use properly the user's udgements of the system's performance; that is, the user's satisfaction or (lissatisfaction. The Cranfield experiments, and others, recognized and ittempted to control for the need problem by eliminating it entirely through I)ie use of artificial questions (that is, questions without underlying needs). Ihen the relevance judgements were carried out in an `objective' manner, tititainted by individual differences among variable users. This strategy is ti[OCRerr]eful in that it explicitly recognized the difficulty of dealing with individual tuformation needs. The problem is that there is no a priori reason to suppose ih[OCRerr]it the performance of a system measured in this way correlates at all well with performance as evaluated by posers of real questions. Furthermore, such evaluation techniques tend to assume that the user iteeds or desires all of the relevant documents. Cooper33 and Oddy34, among tttllers, argue cogently against this assumption, and it seems that in many i[OCRerr]es, what the user desires is not all of the potentially relevant documents, Ittit, say, only one useful one. The concept of utility as an evaluation measure ii i sense is recognition of the importance of taking account of desire on the [OCRerr]t,irt of the user. If these user-related factors are ignored, then the evaluation tileasures which depend upon them, although certainly measuring something, itity not be measuring anything practically useful. ([OCRerr]onfounded concepts [OCRerr]`itisfaction (of need, of desire) is of course the basic concept in information i[OCRerr][OCRerr]trieval system evaluation, and as such cannot be ignored in any test. The `.iuious concepts of relevance which have been proposed and used testify to 11% importance, and to its intractability. There appear to be two strong it'[OCRerr]tsons for making sure that the operational definition of satisfaction is tI(isely related to user judgements. One is, that if user judgement is factored `it, then the basis for evaluation of system performance may be unrelated to it'il situations. The other is that it seems clear that actual satisfaction tIi(igements are order-dependent, and this cannot be dealt with unless one wurks within a context in which needs are assumed to change with new `tilormation. This last point is especially difficult to deal with in any testing t'iivironment, whether one recognizes its importance or not, and appears to ictluire the development of some quite new experimental paradigms and [OCRerr]v4'iIuation measures.