IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
The decade 1968-1978 239
devoted to much more thorough performance evaluation than that of the
previous decade. A significant feature of the generally statistical approaches
adopted has been the idea of relative rather than absolute merit, whether in
the characterization of individual documents, of a collection, of requests, or
of document-query matches. Manual indexing tends to involve an all or
nothing approach to indexing and retrieval. Numerical measures of merit
can of course be used with a threshold to select items in indexing or searching,
but more power, because more discrimination, is involved in the general idea
of weighting; and as indicated earlier, a good deal of the theoretical work in
information retrieval in this decade has been concerned with the notion of
ranking determined by probability.
Evaluation tests on automatic indexing and searching were chiefly devoted
to statistical methods, not simply in the absence of non-statistical techniques,
but with the support of the theories justifying statistical approaches to
indexing and matching. The tests have included ones on individual document
index term weighting, though not selection, on vocabulary selection and
weighting, on term clustering and document clustering, and on query term
selection and weighting. A number of projects have carried out experiments
on more than one of these: the Smart Project work in this decade in particular
has included tests in all of these areas4' 63-66, 96-98 Sparck Jones has been
concerned with vocabulary selection and weighting, term clustering75-
77, 79, 80, 93-95, and query weighting, and van Rijsbergen with term clustering,
document clustering, and query weighting81' 105-107
Automatic methods not using relevance information
To consider work on automatic methods other than those involving relevance
information first. There has been no evaluation testing of methods for the
direct selection of terms for documents along the lines of Damerau's earlier
investigation, though Evans tested indexing by automatic assignment of
manual thesaurus terms86. Simple weighting by within-document term
frequencies has been studied by the Smart Project96' [OCRerr] More attention has
been devoted to the treatment of the collection vocabulary, as in Salton's use
of discrimination functions to select and weight vocabulary terms96 [OCRerr] or
the use by Salton96' [OCRerr] and Sparck Jones93' 95 of inverse document frequency
weights. A whole range of tests with term clusters, used either to define
classes of substitute terms or sets of additional terms was carried out by
Vaswani and Cameron78 and by Sparck Jones80' 95, and a more restricted
test by Cagan1 08 Smart Project tests on term clustering during the decade
have been rather restricted ones with modified manual thesauri and
`statistical phrases'65' 97, 98, Document clustering has been studied by the
Smart workers63 and by van Rijsbergen105-107
The focus, motivation and assumptions of these tests were very much
those of the previous decade. The general aim has been to demonstrate the
value of statistical selection, weighting and classification techniques for
retrieval, mostly by comparison with their absence, but sometimes, as in
some Smart tests, by comparison with manual alternatives. More specific
concerns have been to evaluate competing statistical methods for providing
a given device, for example approaches to term classification in Vaswani and
Cameron's and Sparck Jones' experiments, and to term weighting in many