IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 88 The pragmatics of information retrieval experimentation of queries with the same number of terms, but this makes difficult an overall assessment of performance. Co-ordination level averaging is best used when there is not too much variation in number of terms from query to query. Similarly, for query[OCRerr]document weight cut-off or document rank cut-off, the set of scores or ranks may differ from query to query, so that these methods are best used when there is not much variation in the range of scores or size of output from query to query. If precision at standard recall values is used, then selection, interpolation, and extropolation may be needed to obtain a single precision value for each recall value. The two possibilities for interpolation and extrapolation are: linear interpolation/extrapolation `pessimistic' interpolation/extrapolation, i.e. use the precision value for the next higher recall point. The differences among the various methods are illustrated in a simplified example, Figure 5.3, in which a recall-precision curve is calculated using four different methods: (1) Retrieval cut-off by document[OCRerr]query scores, with co-ordination level scoring and microaveraging of recall and precision. (2) Retrieval cut-off by document rank, with cosine coefficient scoring and microaveraging of precision and recall. (3) Interpolation of average precision values at standard recall values, from the data in 2, using linear interpolation. (4) Interpolation of average precision values at standard recall values, from the data in 2, using pessimistic interpolation. The data which generated the curves in Figure 5.3 are given by the following arrays: Document-Term Incidence Array: Term Number Document Number 12345 678 1 11110 000 2 11001 100 3 00000 111 4 00010 110 5 01100 000 6 10000 001 7 00001 111 8 01110 000 9 01000 110 10 10001 000 Figure 5.3 (opposite). Recall-precision curves for a simplified retrieval output, using four methods of determining the points at which recall and precision will be averaged. (a) c....C[OCRerr]ordination level points; d Document rank points. (b) 1 Standard recall points, linear interpolation; p Standard recall points, pessimistic interpolation