Information Retrieval Experiment

IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 294 An experiment: search strategy variations in SDI profiles Tahie 14.2 shows that, in terms of information scientist effort, the simplest strategy, CT, takes almost exactly half as long to compile as the most complex strategy, BW. Modification tinics As mentioned on p.287 above the profiles were analysed and (perhaps) modified Just once, viz. after completion of the first group of 4 SDI runs but before starting the second group. The initial analysis procedure adopted was standard for all profiles irrespective of whether any relevance assessments had been received from the users. Ten minutes were taken for an examination of the profile performance after which time a decision was taken as to whether or not any basic modifications were necessary. Twenty-two users' profiles were in fact amended. It is emphasized that the time taken for any particular modification is assigned in full to all the search strategy variations incorporating that modification, e.g. if 20 mm are spent on amending the boolean equations then this time is allocated to both strategies B and BW. Averaging the modification time data (including all profiles whether modified or not) and adding the 10 mm initial analysis time the average strategy modification times obtained were: CT = 13, CG = 13, CTW = 14, CGW= 14, TWC= 15, GWC= 15, GTWC= 15, B=21, and BW=22 mm. Discussion Before leaving the profile compilation procedure it may be useful to discuss the standard tasks in more detail in particular to consider some of the conflicts that occurred in trying to achieve a balance between experimental rigour and what common sense indicated should be done in a real situation. It has already been stated that for a valid comparison of search strategies it seemed essential that, for a particular user statement, the same basic set of search terms should be used. In fact occasions arose when this was contrary to the needs of particular strategies, e.g. in the use of negative weights, NOT logic, and WITHIN logic, which facilities do not feature sensibly in the co- ordination strategies CT, CG, CTW, CGW and CRTW. Fxamples of the use of these facilities are detailed in the original report and the extent to which they were used is indicated by the fact that, of the 55 statements received, negative weights were included for 10 users, NOT logic was included for 8 users, and WITHIN logic was included for 2 users. Another general problem occurs when the original user statement really covers more than one basic subject interest or question. With boolean strategies, if nesting or sublogic facilities were available, there would be no problem but with co-ordination strategies it seems nonsensical to mix search terms from what are essentially different questions. In those cases where obviously more than one subject interest was involved the user statement was divided and treated as 2 (and once 3) completely separate questions. It is now felt that this should have been done for more of the user statements than was in fact the case, viz. 6 users. Some specific problems encountered when executing the individual standard tasks were