IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Decision 9: How to analyse the data? 85 (2) The Delphi technique, in which individuals are shown an analysis of responses from all members of the group and permitted to revise their own responses. The process is iterated until convergence (agreement) among group members is achieved. No report has been received of a Delphi process which did not converge (for obvious reasons!). 5.9 Decision 9: How to analyse the data? Analysis of results is either descriptive or inferential. That is, one may simply summanz the data obtained or one may generalize and make predictions from it about larger sets of data or populations. As mentioned earlier, the techniques of statistical inference and decision- making are based on the assumption that the data constitutes a random sample from the population, i.e. a sample selected in such a way that each possible sample of the same size has the same probability of occurring. In practice, we cannot always guarantee that this condition has been met. A sample is usually considered suitably random if some kind of chance mechanism has been used in its selection and there are no apparent biases. It is only in the past few years that inferential rather than descriptive methods have been used at all widely in information retrieval testing. One reason for earlier neglect may have been that information scientists were not familiar with statistical inference. Another is that sample document and query sets were distinctly non-random[OCRerr] However, the importance of randomization and experimental design is increasingly recognized in retrieval experiments and so inferential tests should be more prevalent in the future. The value of statistical inference lies in its generalizing p[OCRerr]tential. Unless information science is able to derive general results or `laws', it will remain a very primitive science. Descriptive methods Descriptive methods encompass: (1) The various graphical and tabular displays of variable frequencies and relationships, such as the recall[OCRerr]precision curve, which have long been part of information retrieval test methodology (2) The calculation of descriptive statistics measuring central tendency, variability, association, and other characteristics. Measures of central tendency include: the arithmetic mean, or average value; the median, or middle value; the mode, or most frequent value. Measures of variability include: the variance, or averaged squared distance of the observations from their mean the standard deviation, or square root of the variance; the range, or difference between the smallest and largest values;