IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Results 307 range from about 1000 for CT-type profiles to about 550 for BW-type profiles. These figures still tend to the optimistic in that they account only for the purely `intellectual' aspects of profile handling. Normally time will also be spent liaising with users (existing and potential), checking subscriptions, etc. This variable, however, related more to job responsibility than any basic profile compilation or modification procedure, would not be expected to change by much the relative intellectual effort required by the different search strategies. Computer search costs i i t With a generalized search package rather than separate optimal programs, the amount of information obtainable on the actual computer costs for the different search strategies was limited. Tests with the software showed that it is the matching of profile and document terms rather than profile logic evaluation which controls the search rate; in other words, the number of profile search terms is paramount and the profile logic evaluation represents a minor part of the total computer processing cost. In the experiment, for each user statement, all search strategies except CRTW used the same set of search terms, and it may be assumed that the computer search costs for the different search strategies except CRTW would not differ by more than 20 per cent. Financial considerations dictated that the different search strategy types were not run separately against the document collections and that the computer matching runs were conducted as a background job timeshared with a variety of other tasks which differed from run to run. In these less-than-ideal conditions the computer search costs per query (search strategy) per year were some 3-5 times the cost of information scientist time. The other costs mentioned earlier but not investigated would diminish further the contribution of information scientist time to perhaps l[OCRerr]l 5 per cent of the total overall costs. The changing balance in the man- machine equation probably means that the information scientist time would now contribute a greater share to the total costs than was the case 5 years ago. Clearly there are two main approaches to effecting savings in the computer search costs, viz. reduction in the number of search terms and simplification of the term-matching procedure. The latter might be achieved by for- going some of the more sophisticated matching facilities, e.g. simultaneous left- and right-hand truncation, ability to distinguish upper and lower case characters, and the universal character. Individual profiles can be found where one or more of these facilities is very useful and convenient but in most instances their absence can be surmounted by the use of additional search terms at a lesser overall cost. The value of these facilities varies with subject area and in the chemical field the truncation facility might be considered vital. A valid evaluation of their importance would probably require a large number of search profiles for any significant effect to be apparent. Cost-effectiveness It became apparent early in the project that insufficient data would be obtained to enable any absolute conclusions to be drawn concerning the