IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 7: How will treatments be assigned to experimental units? 79
The problem with this approach is that there is such a large variation among
queries with respect to recall, precision, and other criterion measures of
interest to experimenters that these variations may mask variations caused
by the indexing language, which the experiment is supposed to determine.
Another approach is to use a design with repeated measures. As the name
implies, this means that the same experimental unit is subjected to the
treatments of interest, i.e. each query is searched using all three indexing
languages. Such designs permit control over individual differences. Thus,
using the same notation as in the previous example, a two-factor experiment
(language by searcher) with repeated measures would look like this:
gi g2 g3
si Ql Qi Qi
s2 Q2 Q2 Q2
s3 Q3 Q3 Q3
s4 Q4 Q4 Q4
where Qi, Q2, Q3, Q4 are sets of n/4 queries, n being the total number of
queries available in the experiment.
If instead of assigning different query sets to each searcher one assigns the
same set, then the query has in effect become a third factor in a language by
searcher by query experiment.
Repeated measures designs have the advantage that fewer queries are
needed for the same reliability. However, they have the drawback of
introducing possible `sequence' effects the effects of practice, training,
learning from a search in one indexing language to a search of the same query
in another. In his standard text on experimental design, Winer18 says:
`In experiments where sequence effects are likely to be marked, a repeated
measure design should be avoided. In cases where sequence effects are
likely to be small relative to treatment effects, repeated measure designs
can be used. Randomizing the order of administration tends to prevent
confounding of treatment and sequence effects.'
The experimenter must himself judge the magnitude of sequence effects on
searchers. One would expect them to be greater with novice than with
experienced personnel.
Another way to control sequence effects is by using a Latin square design.
A Latin square is an n by n table or array in which the entries in the table are
n distinct symbols, assigned so that each appears once in each row and in
each column. For example, here are two different 3 by 3 Latin squares:
123 132
231 321
312 213
In experimental design, the rows and columns represent levels of two
factors (for example, indexing language and search order). The entries in the
body of the table represent experimental units or sets of randomly assembled
experimental units (for example, sets of queries). Note that for a Latin square
to be used as an experimental design one must have
mN(R) mN(C) N(Q)