IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
i
The decade 1968-1978 231
bibliographic record fields, like titles or abstracts, which were regarded as
representing different indexing languages, rather than a single language,
namely natural language, used for indexing from different sources. The
UKCIS investigation53' [OCRerr] illustrates this approach. However somewhat
greater care was taken in this group of tests in the treatment of such
dependent variables as indexing exhaustivity than was usually the case in the
previous decade's tests.
The second group of tests was indeed concerned with indexing rather than
with the indexing language used, and particularly with exhaustivity and
specificity. Thus Schumacher, March and Scheffler's test55, for instance, was
concerned with the effects of exhaustivity on performance, as was ISILT.
Tests on indexing language specificity, like Svenonius'56, also fall into this
category of more detailed studies of single variables.
The conclusions about the importance of searching reached in some of the
earlier tests were followed up in a number of studies of searching, which has
also been a topic of interest to those responsible for online services. Some of
these studies, like that of Katzer57, were concerned with the form of the
query in a narrow sense, others like those of the UKCIS group or Leggate et
al.58, with the properties of user queries, and yet others like those of
Barraclough et al.59 or Keen's EPSILON test60, with the behaviour of users
in searching. Tests with broad or narrow question formulations like Aitchison
et al.'s also fall into this group.
A particular trend of the 1970s has been an interest in weighting and its
natural corollary, output ranking. In some cases weighting has been
determined by the properties of individual documents, or of the collection as
a whole, so the tests really fall under the heading of index language or
indexing studies; but in other cases weights are associated specifically with
query terms, representing aposteriori rather than a priori document indexing,
and weighting here is more properly subsumed under searching and the
organization of search output, especially by non-boolean matching functions.
Different tests have to be examined very carefully here to determine their
true rather than apparent concern: for example document set weights
calculated at search time for the query terms only are nevertheless logically
distinct from individual query weights. In fact, though tests with manually
assigned weights have been carried out, for example by Evans for query
terms61 62, most of the work done on weighting has been done in the context
of automatic indexing. The development of research on weighting in this
context has, however, paralleled that of work on manual indexing, in that the
emphasis has increasingly been on the role of weights in searching. Thus the
most noticeable feature of retrieval research in the 1970s has been the
experimental work on the general idea of relevance feedback, and on
relevance weighting in particular, within automatic systems. Research in
this area was begun by the Smart Project in the 1960s, and is represented by
a long series of experiments through the decade4' 63-67 Other tests in the
area have been conducted by Miller68-70, UKCIS-Barker, Veal and
Wyatt54' 71, and subsequently Robson and Longman72' 73-Cameron74, and
Sparck Jones75-77. This approach to searching is particularly interesting in
being that most conspicuous in the whole area of information retrieval in
having some solid theoretical underpinning.
These relevance feedback and weighting techniques are largely statistically