IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 9: How to analyse the data? 97
TABLE 5.6
Design Type of variable
Approx. normal, equal variances Continuous, discrete,
some ordinal
Single factor
Independent samples One-way ANOVA Kruskal-Wallis test
Dependent samples One-way ANOVA, repeated measures Friedman test
Complete blocks One-way ANOVA, complete blocks Noether's T test
Incomplete blocks One-way ANOVA, incomplete blocks Durbin test
tfrat the F test is relatively insensitive to moderate departures from normality.
Thus, it may be used when the data are only approximately normal. Many of
the variance stabilizing transformations also make the data more normal.
In general, data consisting of counts, e.g. number of relevant documents,
or times, e.g. search time, should be analysable by parametric methods. The
arcsin transformation is useful in stabilizing the variances and improving the
normality of proportions such as recall and precision. Times which are
skewed towards low values can have their distributions improved by the
logarithmic transformation.
Following a significant ANOVA, i.e. a significant difference in treatments,
the experimenter may wish to test which particular treatment pairs differ. A
number of tests are available for such contrasts: the Newman-Keuhls,
Duncan, Tukey, and Sheffe' tests. Details may be found in Winer.
Wherever possible, a parametric test is to be preferred to a non-parametric
one because of its great efficiency. Pittman (see Noether) defines efficiency as
follows:
`If we have two tests of the same hypothesis and significance level and if
for the same power with respect to the same alternative one test requires
a sample size Ni and the other a sample size N2, the relative efficiency of
the first with respect to the second is given by e = N2/Nl.'
Noether gives specific examples of the efficiency of non-parametric tests
against normal curve alternatives. The asymptotic (i.e. large sample)
efficiency of the T[OCRerr], Kruskal-Wallis, Durbin, Friedman, and Wilcoxon-
Mann-Whitney tests will not fall below 0.864 and may be as high as 0.955.
The Sign test, however, has an efficiency of only 0.64.
Another advantage of parametric tests is that they are easier to compute.
Most non-parametric tests require ranking the observations, an operation
whose time is proportional to n2, or at least n log n. Parametric tests, on the
other hand, are based on adding and squaring[OCRerr]perations whose time is
proportional to n. For large samples, this difference may be important.
Exploring relationships
Exploring relationships may involve either:
(1)
Determining if two variables are related or independent, e.g. is search
time related to searcher experience?
(2) Estimating the degree of relationship between them, e.g. what is the
correlation between the frequency of use of a document and its age?