IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 74 The pragmatics of information retrieval experimentation I too many questions about the objectivity of the results. In fact, the investigator should probably function only as a planner. In other words, he or she should not select, search, or evaluate queries, but rather make decisions about procedures for selection, searching, and evaluation. Great variability has been exhibited by information retrieval experimenters I in their method of obtaining test queries. Much the same dilemma arises he rei as with selection of a database. One may either solicit the co-operation of theI actual users of a system or use queries which are in some sense artificial butI. under greater control of the investigator. The dichotomy is really more of a continuum, where, at one end there is the user-dominated query and search process in which all decisions relating to the initial topics, direction and length of the search, and evaluation of output are controlled by the user, and the experimenter-dominated search, where they are made by the experimenter. Most experiments lie somewhere between. In Cranfield 2, the authors of source documents framed questions based on their papers and then evaluated the relevance of all references in the original paper. This method, at least, gives an initial relevant set. Other documents in the collection were assessed by judges, not the user. In other experiments, written queries from the history files of an operational system were used with no attempt made to contact the originators. Or queries can be manufactured by artificial means, such as using the title for the query and references for relevant documents. A problem in using bonafide users is to secure their co-operation, particularly if it means there will be constraints on the search process such as size of file searched or length of search time, and if users are expected to return evaluated output. The user will generally feel that he or she should be receiving something useful in return for his/her time. Free searches on commercial systems with large databases is one inducement. However, some bias will result from this approach, particularly with respect to cost effectiveness. In an environment where users normally pay to do searches, for example, they may be tempted to do for free broader searchers than normal. Some effort to control this factor, such as limiting free offline printing to some maximum number, is probably necessary. With small experimental files, where no large immediate benefit accrues to the user, the most effective approach may be a payment. This method is more successful with indigent students than highly-paid professionals, given the rate most information scientists can afford. Payment of participants should, wherever possible, be included in research grant applications. Getting assessments of document relevance is an even greater problem than getting queries. With real users, it is best to obtain these immediately after the search and before the user has escaped the premises. If this is not possible, one can again offer an inducement, such as payment or copies of documents to users who complete their evaluations. Ideally, users should be randomly selected from a pool by the investigator. In practice this is rarely possible. Users are normally self-selected because of the degree of co-operation required of them. The best the investigator can do is to attempt to get a reasonable mix with regard to user traits such as subject background, experience in using the system, and professional level. If random selection is possible then, of course, it should be used. As with databases, conclusions must be restricted to the population from which the A'