IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. i The decade 1968-1978 231 bibliographic record fields, like titles or abstracts, which were regarded as representing different indexing languages, rather than a single language, namely natural language, used for indexing from different sources. The UKCIS investigation53' [OCRerr] illustrates this approach. However somewhat greater care was taken in this group of tests in the treatment of such dependent variables as indexing exhaustivity than was usually the case in the previous decade's tests. The second group of tests was indeed concerned with indexing rather than with the indexing language used, and particularly with exhaustivity and specificity. Thus Schumacher, March and Scheffler's test55, for instance, was concerned with the effects of exhaustivity on performance, as was ISILT. Tests on indexing language specificity, like Svenonius'56, also fall into this category of more detailed studies of single variables. The conclusions about the importance of searching reached in some of the earlier tests were followed up in a number of studies of searching, which has also been a topic of interest to those responsible for online services. Some of these studies, like that of Katzer57, were concerned with the form of the query in a narrow sense, others like those of the UKCIS group or Leggate et al.58, with the properties of user queries, and yet others like those of Barraclough et al.59 or Keen's EPSILON test60, with the behaviour of users in searching. Tests with broad or narrow question formulations like Aitchison et al.'s also fall into this group. A particular trend of the 1970s has been an interest in weighting and its natural corollary, output ranking. In some cases weighting has been determined by the properties of individual documents, or of the collection as a whole, so the tests really fall under the heading of index language or indexing studies; but in other cases weights are associated specifically with query terms, representing aposteriori rather than a priori document indexing, and weighting here is more properly subsumed under searching and the organization of search output, especially by non-boolean matching functions. Different tests have to be examined very carefully here to determine their true rather than apparent concern: for example document set weights calculated at search time for the query terms only are nevertheless logically distinct from individual query weights. In fact, though tests with manually assigned weights have been carried out, for example by Evans for query terms61 62, most of the work done on weighting has been done in the context of automatic indexing. The development of research on weighting in this context has, however, paralleled that of work on manual indexing, in that the emphasis has increasingly been on the role of weights in searching. Thus the most noticeable feature of retrieval research in the 1970s has been the experimental work on the general idea of relevance feedback, and on relevance weighting in particular, within automatic systems. Research in this area was begun by the Smart Project in the 1960s, and is represented by a long series of experiments through the decade4' 63-67 Other tests in the area have been conducted by Miller68-70, UKCIS-Barker, Veal and Wyatt54' 71, and subsequently Robson and Longman72' 73-Cameron74, and Sparck Jones75-77. This approach to searching is particularly interesting in being that most conspicuous in the whole area of information retrieval in having some solid theoretical underpinning. These relevance feedback and weighting techniques are largely statistically