Information Retrieval Experiment

IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 242 Retrieval system tests 1958-1978 The results of these tests are some of the most striking of the decade. There are again large variations in the different tests, for example for different weighting formulae; and there are also, making due allowance for different performance representation methods, quite wide variations between tests. But the findings for relevance weighting in particular show large improve- ments in performance in the recall/precision graphs for weighted compared with unweighted searching. Considering the specific findings, Smart findings for relevance feedback adding or deleting terms show comparative performance graph differences of 5-15 per cent, Barker et al's user-modified profiles exploiting relevance information show gains in relative recall of l[OCRerr] 30 per cent for no loss of precision. Robson and Longman's complicated tests for producing profiles via relevance weighting showed automatic profile performance ranging from 25.6 per cent precision and 61.3 per cent relative recall or 36.5 per cent precision and 76.3 per cent relative recall compared with manual results of 31.0 per cent precision and 82.1 per cent recall or 39.9 per cent precision and 87.9 per cent recall, for different categories of profile. The experiments weighting given query term lists show a wide range of differences: Miller's cutoff weighting output gave 15 per cent precision and 64 per cent relative recall compared with unweighted boolean 17 per cent precisions and 46 per cent recall (by my calculation). The Smart Project and Sparck Jones show graph differences ranging from about 5 per cent to several hundred per cent: Sparck Jones' weighting differences ranged from 50 per cent to some hundred per cent, compared with unweighted performance, though these particular findings, like any others, may be partly attributable to the performance representation methods used. Cameron's experiment in this area showed a gain in recall from 44.3 per cent to 60.6 per cent, for a decline in precision of 60.7 per cent to 41.0 per cent. The predominantly favourable findings have naturally been interpreted as demonstrating the value of the various statistically-based techniques for utilizing relevance information, particularly as in some cases weighting improves on already competitive or good performance. Thus Miller's test shows statistical weighting competitive with standard Medlars boolean searching, and the Smart workers and Sparck Jones claim that their substantial series of tests show that relevance feedback and weighting are useful. In the operational context the UKCIS workers note that the automatic weighting methods do effectively reduce user effort. Certainly the implication of these tests would appear to be that using relevance information, possibly in the particular, theory-motivated way advocated by Robertson and Sparck Jones and van Rusbergen1 10, is a helpful approach to retrieval. A small amount of work has been done on non-statistical automatic indexing, for example by O'Connor82' 83 and (pseudo-automatically) by Atherton84. Both of these studies are of interest in concentrating on neglected areas of retrieval, namely of passages and monographs respectively O'Connor shows simple proximity or rudimentary syntactic strategies effective, Atherton crude significant location utilization procedures. Klingbiel and Rinker's interesting test85, mentioned earlier, was of machine-aided indexing utilizing elementary parsing and a dictionary; the findings showed performance could compete very successfully with manual indexing. The project is an exception to the general trends of the period, linking recent