IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
308 An experiment: search strategy variations in SDI profiles
overall most cost-effective search strategy. The discussion on p.307 suggi[OCRerr]l[OCRerr]
that the computer search costs for all search strategies except CRTW w([OCRerr][OCRerr]Il'l
not differ by more than 20 per cent because they all use the same basic sci t)f
search terms. However strategy CRTW, which by definition always Cont[OCRerr][OCRerr]i:[OCRerr][OCRerr]
20 search terms compared with an average of 46 terms for the other strategi(.%,
is thereby much more economic in terms of computer search costs. If it[OCRerr]
retrieval performance is on a par with the other strategies then stratc[OCRerr]y
CRTW must be the most cost-effective of all the 10 strategies evaluate[OCRerr]1 it
the main experiment. The results presented on pp.299 et seq. show that t1ii[OCRerr]
is in fact the case.
A more modest possibility with the available information is a compari[OCRerr]tttt
of the cost-effectiveness of all the search strategies in terms of informatiolt
scientist cost only. A suitable measure of effectiveness is `relevant documeiit[OCRerr]
retrieved' and, certainly in the SDI situation, the number of relev[OCRerr]iii
documents retrieved at cutoff points of, say, 15 or 25 documents would seciti
to be the most appropriate.
The basic `effectiveness' data included in the original report show, fol
example, that at a cutoff point of 15 items, strategy CT retrieved 164, 127 alit
120 relevance Ri documents, respectively, on runs 1 (46 queries), 5 (4[OCRerr]
queries) and 6 (46 queries). This gives an average figure t'I
(164+127+ 120)/(46+45+46), i.e. 3.0 relevance Ri documents retrievc(f
per query per run by strategy CT at a cutoff point of 15 items. Assuming So
runs per year (weekly SDI service) this figure becomes 150 relevance R I
documents retrieved per query per year. Dividing this number by the ont.
given for information scientist effort on strategy CT in Table 14.7 gives
figure for the cost-effectiveness of strategy CT at a cutoff point of 15 items,
i.e. 150/1.5 or 100 relevance Ri documents retrieved per year per hour [OCRerr]tI
information scientist effort. Repeating the calculation for all the strategies ii
cutoff points of 15 and 25 items gives the comparative cost-effectiveness
figures shown in Table 14.8.
TABLE 14.8. Cost-effectiveness of search strategies (information scientist effort only)
Order Strateg[OCRerr][OCRerr] cost-eftectiteness (re/et[OCRerr]ont documents retrieted/yeor/hour of information
of scientist time)
merit
Relet[OCRerr]ance RI documents Re1et[OCRerr]once RI/2 documents
Cutoft[OCRerr]I5 Cutoft25 Cutoft[OCRerr]I5 Cutoft25
CRTW (108.8) CRTW (150.9) CRTW (251.1) CRTW (355.0)
2 CT (100.0) CT (139.4) CT (233.6) CT (347.0)
3 TWC (98.2) TWC (136.0) TWC (211.6) TWC (311.4)
4 CTW (91.9) CTW (124.7) CTW (204.5) CTW (303.3)
5 CG (89.0) CCI (121.4) CCI (202.8) CCI (294.1)
6 GTWC (83.3) CITWC(114.8) CCIW (180.8) CGW (264.9)
7 CGW (81.8) CIWC (109.5) GWC (175.5) GWC (251.9)
8 GWC (81.3) CGW (108.3) GTWC(169.9) GTWC(245.5)
9 BW (62.8) BW (77.8) BW (128.0) BW (167.9)
Notes. Ii) The information scientist effort for strategy CRTW was assumed equal to thut br strutegy Cl (seep. 291)
(ii) The figures for hoolean strategy BW are not strictly comparable with the others since some of the hooleun outputs
were less than 15/25 items Strategy B was omitted