IRE
Information Retrieval Experiment
The pragmatics of information retrieval experimentation
chapter
Jean M. Tague
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Decision 5: Where to get the queries? 73
Will continue to move in this direction because of cost reductions and
advances in communication technology. For small databases, an online
system is strongly recommended and will usually be competitive. Because of
the importance of user-system interaction in information retrieval, one
would be inclined to predict that batch systems will eventually disappear. If
an online system is not available, a batch system might be used to produce
printed output which could then be used interactively
Costs of developing an experimental database can be reduced by using, or
at least adding to, one already in existence. Additional kinds of indexing, for
example, or citations can be added if required by the investigator. However,
he or she should ensure that the collection meets the standards of randomness
previously described. Also, investigate the possibility that an existing
machine collection can be reformated so that it can be processed by an
information retrieval package on the investigator's local computer system.
There is a problem of a more general nature with the use of existing
databases. If information science is to become a cohesive discipline,
knowledge, as in other sciences, must be cumulated on the basis of
independent experiments. One cannot confirm or contradict another's
general finding by using the same database. There is a grave danger that
findings in information retrieval will be the result of idiosyncracies of popular
test collections, no matter how well or randomly selected. Confirmation and
rejection of conclusions must be based on independent random samples.
Commercial databases such as INSPEC, Chemical Abstracts Condensates,
Science Citation Index can be used in two ways. Either tapes may be
purchased and used in conjunction with software developed or purchased for
the local computer or the commercial online systems ORBIT, DIALOG,
BRS, etc.[OCRerr]may be used directly. Purchase of tapes is expensive, so that one
is usually restricted to a small subset such as a single year, although some
database producers have reduced rates for experimental use. In using
commercial systems directly, one has their software available and pays only
for the time spent searching the system. If the objectives of the experiments
can be achieved by using commercial online systems, there seem good
economic reasons for choosing this alternative.
5.5 Decision 5: Where to get the queries?
Queries are verbalized information needs, and hence query decisions are
really people decisions. This question resolves into three:
(1) What is the source of the original query statement?
(2) Who controls the search process?
(3) Who evaluates the results?
Possible answers to any of the three are:
(1) An actual user of some operational system.
(2) The investigator.
(3) System personnel (operational or experimental).
(4) Any combination of the above.
Clearly, the investigator should not do all three. Such a procedure raises