IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 9: How to analyse the data? 85
(2) The Delphi technique, in which individuals are shown an analysis of
responses from all members of the group and permitted to revise their
own responses. The process is iterated until convergence (agreement)
among group members is achieved. No report has been received of a
Delphi process which did not converge (for obvious reasons!).
5.9 Decision 9: How to analyse the data?
Analysis of results is either descriptive or inferential. That is, one may simply
summanz the data obtained or one may generalize and make predictions
from it about larger sets of data or populations.
As mentioned earlier, the techniques of statistical inference and decision-
making are based on the assumption that the data constitutes a random
sample from the population, i.e. a sample selected in such a way that each
possible sample of the same size has the same probability of occurring. In
practice, we cannot always guarantee that this condition has been met. A
sample is usually considered suitably random if some kind of chance
mechanism has been used in its selection and there are no apparent biases.
It is only in the past few years that inferential rather than descriptive
methods have been used at all widely in information retrieval testing. One
reason for earlier neglect may have been that information scientists were not
familiar with statistical inference. Another is that sample document and
query sets were distinctly non-random[OCRerr] However, the importance of
randomization and experimental design is increasingly recognized in retrieval
experiments and so inferential tests should be more prevalent in the future.
The value of statistical inference lies in its generalizing p[OCRerr]tential. Unless
information science is able to derive general results or `laws', it will remain
a very primitive science.
Descriptive methods
Descriptive methods encompass:
(1) The various graphical and tabular displays of variable frequencies and
relationships, such as the recall[OCRerr]precision curve, which have long been
part of information retrieval test methodology
(2) The calculation of descriptive statistics measuring central tendency,
variability, association, and other characteristics.
Measures of central tendency include:
the arithmetic mean, or average value;
the median, or middle value;
the mode, or most frequent value.
Measures of variability include:
the variance, or averaged squared distance of the observations from their
mean
the standard deviation, or square root of the variance;
the range, or difference between the smallest and largest values;