IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
74 The pragmatics of information retrieval experimentation I
too many questions about the objectivity of the results. In fact, the
investigator should probably function only as a planner. In other words, he
or she should not select, search, or evaluate queries, but rather make decisions
about procedures for selection, searching, and evaluation.
Great variability has been exhibited by information retrieval experimenters I
in their method of obtaining test queries. Much the same dilemma arises he rei
as with selection of a database. One may either solicit the co-operation of theI
actual users of a system or use queries which are in some sense artificial butI.
under greater control of the investigator. The dichotomy is really more of a
continuum, where, at one end there is the user-dominated query and search
process in which all decisions relating to the initial topics, direction and
length of the search, and evaluation of output are controlled by the user, and
the experimenter-dominated search, where they are made by the
experimenter.
Most experiments lie somewhere between. In Cranfield 2, the authors of
source documents framed questions based on their papers and then evaluated
the relevance of all references in the original paper. This method, at least,
gives an initial relevant set. Other documents in the collection were assessed
by judges, not the user. In other experiments, written queries from the history
files of an operational system were used with no attempt made to contact the
originators. Or queries can be manufactured by artificial means, such as
using the title for the query and references for relevant documents.
A problem in using bonafide users is to secure their co-operation,
particularly if it means there will be constraints on the search process such as
size of file searched or length of search time, and if users are expected to
return evaluated output. The user will generally feel that he or she should be
receiving something useful in return for his/her time. Free searches on
commercial systems with large databases is one inducement. However, some
bias will result from this approach, particularly with respect to cost
effectiveness. In an environment where users normally pay to do searches,
for example, they may be tempted to do for free broader searchers than
normal. Some effort to control this factor, such as limiting free offline printing
to some maximum number, is probably necessary. With small experimental
files, where no large immediate benefit accrues to the user, the most effective
approach may be a payment. This method is more successful with indigent
students than highly-paid professionals, given the rate most information
scientists can afford. Payment of participants should, wherever possible, be
included in research grant applications.
Getting assessments of document relevance is an even greater problem
than getting queries. With real users, it is best to obtain these immediately
after the search and before the user has escaped the premises. If this is not
possible, one can again offer an inducement, such as payment or copies of
documents to users who complete their evaluations.
Ideally, users should be randomly selected from a pool by the investigator.
In practice this is rarely possible. Users are normally self-selected because of
the degree of co-operation required of them. The best the investigator can do
is to attempt to get a reasonable mix with regard to user traits such as subject
background, experience in using the system, and professional level. If
random selection is possible then, of course, it should be used. As with
databases, conclusions must be restricted to the population from which the
A'