IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Results 309
Some points of interest to emerge from Table 14.8 are:
(1) Strategy CRTW is confirmed, without qualification, as the most cost-
effective search strategy.
(2) Compared with the ranking of strategies based on retrieval performance
only (p.301), strategies which have strikingly changed their relative
positions are CT (upwards), GWC (downwards), and BW (downwards).
(3) The 4 strategies comprising basically a single list of search terms, viz.
CRTW, CT, TWC and CTW, occupy the top four positions.
Free- and controlled-language comparlson
I
I
I
I
I
I
Although not envisaged as part of the original project the data that became
available during the experiment was considered suitable for a direct
comparison of free-language and controlled-language boolean profiles in the
INSPEC environment. The data covered profile compilation times, number
of search terms, and recall/precision performance figures. These are all
detailed in the original report and are only summarized here.
The average compilation time for the controlled-language boolean profiles
(31 mm) was just less than half that for the free-language boolean profiles (65
mm). The times recorded were for a compiler who was already familiar with
the controlled language concerned-INSPEC's thesaurus and unified
classification scheme. These compilation times may be slightly biassed in
favour of the controlled-language profiles because invariably the free-
language versions were compiled first. When the controlled-language version
came to be compiled there would probably be some memory of the original
user statement even though the free-language version may have been
compiled some weeks earlier.
As well as having shorter compilation times, the controlled-language
boolean profiles were smaller than the free-language boolean profiles by a
factor of 2+, averaging 19 terms and 47 terms respectively. It should be
pointed out that one reason for the smaller number of search terms in the
controlled-language profiles is that they were used in searching in a `free4ext'
way, i.e. extensive use was made of truncation in the search terms. Assuming
an approximately linear relationship between the number of profile search
terms and computer search time, this factor of 2+ would be largely reflected
in the computer search costs in favour of the controlled-language profiles.
There is a further saving of search time for controlled-language profiles
because the controlled-language searchable fields are smaller than the free-
language searchable field in the INSPEC database. Statistics then current
indicated that the relative sizes of the three fields were in the ratios:
Free indexing 10
Subject headings (thesaurus terms) 5
Unified classification codes 1
Because compilation of the controlled-language boolean profiles was
started after the main experiment had got under way the quantity of
experimental data for the first few SDI runs was limited. Only the
recall/precision figures for the last three of the eight runs were analysed, i.e.
those for runs 6, 7 and 8. Overall values for recall and precision calculated by
the usual two averaging methods (average of numbers and average of ratios)