IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Decision 5: Where to get the queries? 73 Will continue to move in this direction because of cost reductions and advances in communication technology. For small databases, an online system is strongly recommended and will usually be competitive. Because of the importance of user-system interaction in information retrieval, one would be inclined to predict that batch systems will eventually disappear. If an online system is not available, a batch system might be used to produce printed output which could then be used interactively Costs of developing an experimental database can be reduced by using, or at least adding to, one already in existence. Additional kinds of indexing, for example, or citations can be added if required by the investigator. However, he or she should ensure that the collection meets the standards of randomness previously described. Also, investigate the possibility that an existing machine collection can be reformated so that it can be processed by an information retrieval package on the investigator's local computer system. There is a problem of a more general nature with the use of existing databases. If information science is to become a cohesive discipline, knowledge, as in other sciences, must be cumulated on the basis of independent experiments. One cannot confirm or contradict another's general finding by using the same database. There is a grave danger that findings in information retrieval will be the result of idiosyncracies of popular test collections, no matter how well or randomly selected. Confirmation and rejection of conclusions must be based on independent random samples. Commercial databases such as INSPEC, Chemical Abstracts Condensates, Science Citation Index can be used in two ways. Either tapes may be purchased and used in conjunction with software developed or purchased for the local computer or the commercial online systems ORBIT, DIALOG, BRS, etc.[OCRerr]may be used directly. Purchase of tapes is expensive, so that one is usually restricted to a small subset such as a single year, although some database producers have reduced rates for experimental use. In using commercial systems directly, one has their software available and pays only for the time spent searching the system. If the objectives of the experiments can be achieved by using commercial online systems, there seem good economic reasons for choosing this alternative. 5.5 Decision 5: Where to get the queries? Queries are verbalized information needs, and hence query decisions are really people decisions. This question resolves into three: (1) What is the source of the original query statement? (2) Who controls the search process? (3) Who evaluates the results? Possible answers to any of the three are: (1) An actual user of some operational system. (2) The investigator. (3) System personnel (operational or experimental). (4) Any combination of the above. Clearly, the investigator should not do all three. Such a procedure raises