IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Results 307
range from about 1000 for CT-type profiles to about 550 for BW-type profiles.
These figures still tend to the optimistic in that they account only for the
purely `intellectual' aspects of profile handling. Normally time will also be
spent liaising with users (existing and potential), checking subscriptions, etc.
This variable, however, related more to job responsibility than any basic
profile compilation or modification procedure, would not be expected to
change by much the relative intellectual effort required by the different
search strategies.
Computer search costs
i
i
t
With a generalized search package rather than separate optimal programs,
the amount of information obtainable on the actual computer costs for the
different search strategies was limited. Tests with the software showed that
it is the matching of profile and document terms rather than profile logic
evaluation which controls the search rate; in other words, the number of
profile search terms is paramount and the profile logic evaluation represents
a minor part of the total computer processing cost.
In the experiment, for each user statement, all search strategies except
CRTW used the same set of search terms, and it may be assumed that the
computer search costs for the different search strategies except CRTW would
not differ by more than 20 per cent.
Financial considerations dictated that the different search strategy types
were not run separately against the document collections and that the
computer matching runs were conducted as a background job timeshared
with a variety of other tasks which differed from run to run.
In these less-than-ideal conditions the computer search costs per query
(search strategy) per year were some 3-5 times the cost of information
scientist time. The other costs mentioned earlier but not investigated would
diminish further the contribution of information scientist time to perhaps
l[OCRerr]l 5 per cent of the total overall costs. The changing balance in the man-
machine equation probably means that the information scientist time would
now contribute a greater share to the total costs than was the case 5 years ago.
Clearly there are two main approaches to effecting savings in the computer
search costs, viz. reduction in the number of search terms and simplification
of the term-matching procedure. The latter might be achieved by for-
going some of the more sophisticated matching facilities, e.g. simultaneous
left- and right-hand truncation, ability to distinguish upper and lower case
characters, and the universal character. Individual profiles can be found
where one or more of these facilities is very useful and convenient but in most
instances their absence can be surmounted by the use of additional search
terms at a lesser overall cost. The value of these facilities varies with subject
area and in the chemical field the truncation facility might be considered
vital. A valid evaluation of their importance would probably require a large
number of search profiles for any significant effect to be apparent.
Cost-effectiveness
It became apparent early in the project that insufficient data would be
obtained to enable any absolute conclusions to be drawn concerning the