IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 90 The pragmatics of information retrieval experimentation Query-Term Incidence Array: Term Number Query Number 12345 678 1 11110 000 2 11001 000 3 00001 110 4 00000 011 Query-Document Relevance Array: Document Number QueryNumber 1 2 3 4 5 6 1 101010 2 110000 3 011000 4 000000 7 8 9 10 0100 0101 1000 1001 From these arrays, the average precision and recall at each c[OCRerr]ordination level will be as in Table 5.2. TABLE 5.2 Co-ordination level Average recall Average precision 1 11/13=0.846 11/27=0.407 2 9/13=0.692 9/13=0.692 3 5/13=0.385 5/6=0.833 4 1/13=0.077 1/1=1.000 In the second method, documents are ranked by means of the query document cosine similarity measure. Average recall and precision values at each of the possible 10 document retrieval cutoff ranks are as in Table 5.3. TABLE 5.3 Document cutoff level Average recall Average precision 1 0.231 0.750 2 0.538 0.875 3 0.692 0.750 4 0.692 0.562 5 0.769 0.500 6 0.846 0.458 7 0.846 0.393 8 0,846 0.344 9 0.923 0.333 10 1.000 0.325 Before determining precision at standard recall values, several decisions must be made. First, a precision value must be assigned for recall =1 and precision values for recall values which occur more than once such as 0.692. Either a linearly interpolated value or a minimum precision of 0 may be