IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
242 Retrieval system tests 1958-1978
The results of these tests are some of the most striking of the decade. There
are again large variations in the different tests, for example for different
weighting formulae; and there are also, making due allowance for different
performance representation methods, quite wide variations between tests.
But the findings for relevance weighting in particular show large improve-
ments in performance in the recall/precision graphs for weighted compared
with unweighted searching. Considering the specific findings, Smart findings
for relevance feedback adding or deleting terms show comparative
performance graph differences of 5-15 per cent, Barker et al's user-modified
profiles exploiting relevance information show gains in relative recall of l[OCRerr]
30 per cent for no loss of precision. Robson and Longman's complicated tests
for producing profiles via relevance weighting showed automatic profile
performance ranging from 25.6 per cent precision and 61.3 per cent relative
recall or 36.5 per cent precision and 76.3 per cent relative recall compared
with manual results of 31.0 per cent precision and 82.1 per cent recall or 39.9
per cent precision and 87.9 per cent recall, for different categories of profile.
The experiments weighting given query term lists show a wide range of
differences: Miller's cutoff weighting output gave 15 per cent precision and
64 per cent relative recall compared with unweighted boolean 17 per cent
precisions and 46 per cent recall (by my calculation). The Smart Project and
Sparck Jones show graph differences ranging from about 5 per cent to several
hundred per cent: Sparck Jones' weighting differences ranged from 50 per
cent to some hundred per cent, compared with unweighted performance,
though these particular findings, like any others, may be partly attributable
to the performance representation methods used. Cameron's experiment in
this area showed a gain in recall from 44.3 per cent to 60.6 per cent, for a
decline in precision of 60.7 per cent to 41.0 per cent.
The predominantly favourable findings have naturally been interpreted as
demonstrating the value of the various statistically-based techniques for
utilizing relevance information, particularly as in some cases weighting
improves on already competitive or good performance. Thus Miller's test
shows statistical weighting competitive with standard Medlars boolean
searching, and the Smart workers and Sparck Jones claim that their
substantial series of tests show that relevance feedback and weighting are
useful. In the operational context the UKCIS workers note that the automatic
weighting methods do effectively reduce user effort.
Certainly the implication of these tests would appear to be that using
relevance information, possibly in the particular, theory-motivated way
advocated by Robertson and Sparck Jones and van Rusbergen1 10, is a helpful
approach to retrieval.
A small amount of work has been done on non-statistical automatic
indexing, for example by O'Connor82' 83 and (pseudo-automatically) by
Atherton84. Both of these studies are of interest in concentrating on neglected
areas of retrieval, namely of passages and monographs respectively
O'Connor shows simple proximity or rudimentary syntactic strategies
effective, Atherton crude significant location utilization procedures. Klingbiel
and Rinker's interesting test85, mentioned earlier, was of machine-aided
indexing utilizing elementary parsing and a dictionary; the findings showed
performance could compete very successfully with manual indexing. The
project is an exception to the general trends of the period, linking recent