IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 9: How to analyse the data? 87
Microaverage of precision is 18/41 =0.439. Macroaverage of precision is
2.5/4=0.625.
The choice of averaging method hinges on whether one wishes to give
(locuments or queries equal weight in the averaging process. However, if the
averages are to be used as sample estimates of population values, as discussed
in the next section, then the microaverages should be used, as these have the
Ntatistically desirable property of maximum likelihood (see Tague and
I[OCRerr]arradane' 5) Another advantage of microaveraging is that one does not
usually have to deal with the undefined value 0/0. In macroaveraging, one
can either set such ratios equal to 1 or throw out the query. Neither course is
really satisfactory.
Another problem, thoroughly discussed by Sparck Jones21 and others,
relates to the recall-precision graph. Given ordered document output for a
set of queries, the recall-precision graph will depend on both the measure of
document-query similarity (the scores) and the choice of points to be
displayed on the graph. As described in Section 5.3, there are a number of
ways in which the document query similarity can be measured. These
include:
(1) Co-ordination level, i.e. the number of terms matching between query
and document.
(2) Cosine coefficient and other weighting functions.
Documents may be ranked on the basis of any of these measures.
In order to construct a recall[OCRerr]precision graph, the points at which recall
and precision values will be averaged over queries and displayed on the
graph must then be determined. There are four possibilities:
(1) Average recall and precision across queries at fixed document-query
similarity scores. This method works well with co-ordination level scores
but creates problems with document-query weights which assume a large
number of values.
(2) Average recall and precision across queries at fixed document ranks.
This method is useful when the document-query scores assume a large
number of values.
(3) Average recall and precision values at either fixed scores or fixed ranks
and then interpolate precision at standard recall values, for example
0,0.1,0.2....0.9,1. This gives a smoother curve than Methods I and 2.
Two interpolation methods have been suggested:
(a) linear interpolation,
(b) interpolation to the left between averaged recall values (`pessimis-
tic' interpolation).
(4) Interpolate precision values at standard recall values for each query and
then average precision values over the queries.
When the number of terms matching between document and query (co-
ordination level) is an independent variable, a set of average recall and
precision values can be obtained for a query at each degree of match, i.e. at
1,2,3,... matching terms. A problem arises because not all queries have the
same number of terms, so that the average will be over different numbers of
queries at some co-ordination levels. One can examine only subsets consisting