Information Retrieval Experiment

IRE Information Retrieval Experiment Retrieval effectiveness chapter Cornelis J. van Rijsbergen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Opumal retrieva] 37 Ii[OCRerr]ic we seek a sample of relevant documents to be able to estimate P(x `A), I' I) i[OCRerr] the same for each x, and P(x) is given by the system data. It i[OCRerr] possible to formulate an approach to optimal retrieval without using .iiiking. For this we need some elementary decision theory. The basis of it is l]I[OCRerr]It ccrtain costs are associated with the decision that the retrieval system .111 mike. If we assume that for each document the system can take one of I ictions; a1 : retrieve, a2: not retrieve, and that each document is in one Wo states, either i[OCRerr]'1 : relevant or u'< non-relevant. Then we can associate Iii c[OCRerr]ich (action, state) pair a cost 4[OCRerr]. A table shows the association: [OCRerr]i u2 I \½[OCRerr][OCRerr]1l12 121122 With each action a we can associate an expected cost, viz.: R(a1/x) [OCRerr] 4[OCRerr]P([OCRerr]/x) il ticre P(i[OCRerr],jx) is the probability that the document will be in state liuliutively it would seem reasonable to perform that action which has the *.iiiillest expected cost associated with it. In fact, such a strategy is optimal iii ihe following sense. If the decision rule is a(x), i.e. a(x) takes the value a1 [OCRerr] [OCRerr] for each x, then the overall risk R is defined as R ZR(a(x)/x)P(x) I his function R can be minimized for each x by choosing the smaller of A5(a1/x), R(a2/x). Therefore the retrieval rule will read: ii R(a1/x) < R(a2/x) then retrieve else do not retrieve* optimality here means minimizing the overall risk function. It is interesting to analyse this retrieval rule in a little more detail. Writing ((lit the expected cost functions in full we get R(a1/x) = 111P(w1/x)+112P(w2/x) R(a2/x) = 121P(w1/x){122P(w2/x) It we define a reasonable cost function we would set lii =122=0, thus I c[OCRerr]Iucing the comparison, R(a1/x) <R(a2/x) 112P(w2/x) <l21P(w1/x) (Ir equivalently P(w1/x) 112 P(w2/x) 121 A[OCRerr] is usual, equality is treated by deciding randomly.