IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
5
The pragmatics of information retrieval
experimentation
Jean M. Tague
l'he novice information scientist, though he or she may have thoroughly
[OCRerr]tudied the design and results of previous information retrieval tests and
clearly described the purpose of his/her own test, may still, when faced with
its implementation, have great difficulty in proceeding. Farly information
retrieval experiments were of necessity ad hoc, and it is only in recent years
that a body of practice, based on the experiences of Cleverdon and later
investigators, has made possible a few recommendations on the pragmatics
of conducting information retrieval experiments.
The following remarks, though based to some extent on a study of the
major tests, including those described in later chapters of this book, are
heavily dependent on the author's own trials, tribulations, and mistakes If
there is one lesson to be learned from experience, it is that the theoretically
optimum design can never be achieved, and the art of information retrieval
experimentation is to make the compromises that will least detract from the
usefulness of the results.
In determining experimental procedures, three aspects must be kept in
mind:
(I) The validity of the procedure; does it determine what the experimenter
wishes to determine? If a study is being made of the relation of document
scope to user satisfaction, does the use of number of citations as a measure
of scope and number of references marked relevant' as a measure of
satisfaction really fulfill this purpose?
(2) The reliability of the procedure; can it be replicated by another
experimenter? If one is addressing the problem of inter-indexer
consistency, will a test of the consistency of two indexers indexing 10
documents from a single journal provide results which can be replicated
elsewhere? A procedure may be reliable without being valid, i.e. it may
give consistent results but be measuring something else.
(3) The efficiency of the test procedures; how long will it take, how many
resources- people, computing, supplies, equipment -will it require, how
much will it cost? Is it sensible, for example, to assess the absolute recall
of searches when this means each user will have to peru[OCRerr]e the entire
database? What limitations will this place on the size of the database'?
5,)