IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 6; How to process queries? 77
I)uring the actual experiment, the investigator must tread a fine line between
non-interference and rescue operations.
Although retrieval systems vary greatly in the facilities they provide for
searching, a few general comments can be made on ways to improve
efficiency or reduce the cost of developing, maintaining, and adapting them.
(1) Do not develop your own retrieval software unless absolutely necessary.
There are many retrieval systems now available commercially such as IBM's
STAIRS or Stanford's SPIRES/BALLOTS. Consider the possibility of
writing pre- and post-processing programs which will permit the searches to
be processed by existing packages.
(2) If local software must be developed, employ an experienced computer
specialist, at least as a consultant. Insist on a professional product, i.e.
software which is:
well-documented
structured
completely debugged before the experiment.
Documentation is essential to ensure that anyone, not just the original
designer, can use, maintain, and modify the software. Programs written in a
higher-level language such as COBOL should be to industry standards, to
provide for portability.
Structured programming implies:
top-down development
modularity
use of standard control structures.
These will be explained in turn.
Top-Down Development: The program is developed as a hierarchy of
processes or functions, beginning at the most general level and resolving each
process into more specific processes at the next lower level. For example,
searching a boolean statement against a database with an inverted file
structure could be analysed by the hierarchy chart shown in Figure 5.1.
Modularity: Each module consists of about a page of code and corresponds
to a single function or process box in the hierarchy chart. Lower level modules
must be invoked by modules immediately above them in the hierarchy.
Use of standard control structures: Structured programs are built from
three types of building blocks or control structures:
sequence-one step after another
if then else[OCRerr]branching or transfer of control
do while or do until[OCRerr]looping.
The aim of structured programming is to produce programs which are
understandable and can be easily debugged, modified, adapted, used in part,
etc., by people other than the initial programmer. Given the high turnover
rate in computer personnel, this kind of insurance is essential in information
retrieval experiments. In addition, one would hope that, if it is necessary to
develop an information retrieval system, it can be used in more than one
experiment. Hence the need for comprehensible software.
(3) Online retrieval software should provide searchers with the option of at