IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Results 303 TABLE 14.5. Run S data[OCRerr]ign test for significant difference Search Significantly better Not significantly strategy than diferent from Relevance Ri documents TWC CG (X2=16.9,p<O.OO1) (x2= 14.4,p<O.OO1) (x2= 1O.O,p<O.O1) ([OCRerr]2=6.4,p<O.O2) (x2= 12.l,p<0.00l) (x2= l1.0,p<0.00l) (x2 =9.0, p<0.0l) (x2=8.l,p<0.0l) [OCRerr]2 =49, p<0.05) (x2= 18.2,p<0.001) kx2=ll.0,p<0.00l) (x2 = 7.2, p<0.0l) (x2=6.4,p<0.02) (x2=56[OCRerr]p<002) (x2 =5.6,p<0.02 CT CTW CGW GWC CG CT CGW CRTW CT CG CTW CT CG CGW CG CT GTWC CT CG CG CT Relevance R 1/2 documents TWC CT (x2=12.8,p<0.001) GTWC (x2=9.8,p<0.0l) CG kx2=7.2,p<0.01) GWC CRTW (x2=9.8,p<0.0l) GTWC ([OCRerr]2=8.9,p<0.Ol) CT (x2=8.0,p<0.0l) CG [OCRerr]2=6.4,p<0.02) CRTW CTW CT (x2=21.4[OCRerr]p<0001) CG (x2=6.4,p<0.02) CGW CT (x2=118[OCRerr]p<0001) CG (x2=5.7,p<0.02) GTWC CG CT (x2=4.4,p<0.05) CT GTWC (x2=303) CRTW (x2=3.03) GWC (x2=0.03) CTW (x2=3.6), GTWC (x2=2.03) CRTW (x2=2.03), TWC GTWC (x2=003) CTW (x2=0) CGW (x2=0) TWC, GWC CGW kx2=0.03), GTWC (x2=0.03) GWC, CRTW GTWC (x2=0.23), CRTW, CTW TWC, GWC, CRTW, CTW, CGW CT(x2=1.6) CG CRTW (x2=3.76) CGW (x2=269) CTW([OCRerr]2=1.8), GWC([OCRerr]2=0.02) CTW (x2=376) CGW (x2=3.76) TWC CT (X2=1.09),CGW([OCRerr]2=1.09) GTWC (x2=0.56), CG (x2=0.36) CTW (X2=0.36),TWC CGW [OCRerr]2=0.09), GTWC (x2=0.02) TWC, GWC, CRTW GTWC (x2= 1.8), TWC, GWC CRTW, CTW CT (x2 =0.8), CG (x2 =0) CRTW, CTW, CGW CRTW, GTWC CRTW, GTWC It might be argued that a more powerful test such as the Wilcoxon matched pairs signed ranks test could have been used since the magnitude of the difference in the normalized recall between pairs of search strategies was known for all the queries. This was not pursued because of unease concerning the validity and overall effect of those queries with very' few relevant documents; as mentioned earlier, in the extreme case of a query with only 1 relevant document a recall ratio of 0/1 could so easily be 1/1, and vice versa. Boolean comparison The method used for this individual profile-by-profile comparison of all the strategies was as follows: