IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
I
78 The pragmatics of information retrieval experimentation
least two command language modes: learner mode and experienced or
abbreviated mode. A `help' feature, which permits users to access online
explanations of the various commands in the retrieval system, is very useful.
(4) Retrieval systems should provide facilities for automatic collection of
data needed by the experimenter, for example, number of search statements
entered, number of documents retrieved by a search statement, number of
postings for any term, search time (both connect time and CPU time).
(5) Retrieval systems to be used in experiments should provide a variety of
outputs-title, full citation, abstracts, term or term combination causing
retrieval and so on. Facilities should be provided for offline printing of large
output sets. If output is to be evaluated, the system should provide it in a
suitable form. Users may want to retain the output and so it should be
duplicated, one set for the user, one for the experimenter.
5.7 Decision 7: How will treatments be assigned to experimental
units?
A complete information retrieval experiment is concerned with assessing the
effects of one or more classes of treatments or factors on one or more criterion
measures, where the criterion measure is determined for each of a sample of
experimental units. For example, the treatments might be different degrees
of vocabulary control, the experimental units searches of queries on a
database, and the criterion measures recall and precision. Or the treatments
might be degree of search delegation to an intermediary, the experimental
units online searches of queries on several systems, and the criterion measure
total search time. In a partial test, the treatments might be represented by
levels of indexer training, the experimental units documents to be indexed,
and the criterion measure indexing time. In multi-factor experiments, there
is more than a single set of treatments or factors. For example, in complete
retrieval experiments, frequently the type of indexing language and the
searcher are varied over the query set, giving a two-factor experiment. Or in
an online experiment, three factors might be degree of delegation, online
system, and searcher. A source of experimental units which is not expected
to interact with the factors is called a block. In information retrieval
experiments, sources of queries or users such as different libraries might be
considered blocks.
In a completely crossed factorial experiment, at least one experimental
unit is assigned to each possible combination of factors. Thus, in the two-
factor design, indexing language by searcher, in a completely crossed
experiment, unique queries would be assigned to each combination of
searcher and index language. Thus, if we let gl, g2, g3 represent the three
languages, sl, s2, s3, the three searchers, and Ql, Q2, . . . , Q12 twelve query
sets, a completely-crossed design would be represented as follows:
gi g2 g3
51 Ql Q2 Q3
s2 Q4 Q5 Q6
s3 Q7 Q8 Q9
s4 QlO Qil Q12
1