IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
i
66 The pragmatics of information retrieval experimentation
to assign numbers. He points out that there is nothing sacrosanct about the
well-known scales, and they can be suitably modified by the investigator.
The interaction between client or user and search analyst or reference
librarian known as query negotiation is being increasingly studied. Zipperer7
has analysed this interaction in terms of nine activity categories;
Question negotiation (presentation of query)
Profile development (vocabulary selection)
Tutorial activities (explanation)
Search type selection (current awareness or retrospective)
Strategy formulation (search statement specification)
System description
Database selection
Administrative procedures
Diversionary activities (interruptions)
Although these activities relate more to batch than online retrieval and
thus might be modified for an interactive environment, it is important that
this kind of analysis be standardized, so that results from different studies
may be compared.
Evaluation
Historically, the `evidence' of information retrieval experiments has been in
the form of retrieval effectiveness measures, and more specifically recall and
precision. Cleverdon1 pointed out the reason for this continuing popularity
of these two measures:
`The unarguable fact, however, is that they are fundamental requirements
of the users, and it is quite unrealistic to try to measure how effectively a
system or subsystem is operating without bringing in recall and precision.'
How one calculates recall and precision depends on the ordering of the
output. There are four possibilities:
Unordered output, i.e. output is the retrieved set.
Ranked output, with possible ties in ranking.
Totally ranked output, i.e. each document has a unique rank.
Weighted output, i.e. each document has a weight.
If the retrieval set is unordered, then a four-way partition of the full
database is made to determine recall and precision:
a is the number of relevant and retrieved references.
b is the number of non-relevant and retrieved references.
C is the number of relevant and non-retrieved references.
d is the number of non-relevant and non-retrieved references.
n = a + b + C + d is the total number of references in the database.
Four measures have been defined in terms of this partition:
recall = a/(a + c)
precision = a/(a+b)
fallout = b/(b+d)
generality = (a+c)/(a+b+c+d)
U