IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
90 The pragmatics of information retrieval experimentation
Query-Term Incidence Array:
Term Number
Query Number 12345 678
1 11110 000
2 11001 000
3 00001 110
4 00000 011
Query-Document Relevance Array:
Document Number
QueryNumber 1 2 3 4 5 6
1 101010
2 110000
3 011000
4 000000
7 8 9 10
0100
0101
1000
1001
From these arrays, the average precision and recall at each c[OCRerr]ordination
level will be as in Table 5.2.
TABLE 5.2
Co-ordination level Average recall Average precision
1 11/13=0.846 11/27=0.407
2 9/13=0.692 9/13=0.692
3 5/13=0.385 5/6=0.833
4 1/13=0.077 1/1=1.000
In the second method, documents are ranked by means of the query
document cosine similarity measure. Average recall and precision values at
each of the possible 10 document retrieval cutoff ranks are as in Table 5.3.
TABLE 5.3
Document cutoff level Average recall Average precision
1 0.231 0.750
2 0.538 0.875
3 0.692 0.750
4 0.692 0.562
5 0.769 0.500
6 0.846 0.458
7 0.846 0.393
8 0,846 0.344
9 0.923 0.333
10 1.000 0.325
Before determining precision at standard recall values, several decisions
must be made. First, a precision value must be assigned for recall =1 and
precision values for recall values which occur more than once such as 0.692.
Either a linearly interpolated value or a minimum precision of 0 may be