IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
314 An experirncnt: `,carch `,trategy variations in SDI profiles
matching terms in that group. Although the effect was quite considerable
with the metals' concept it did not occur so obviously with other concepts,
because most documents are such that the same concept is not required to be
represented a large number of times in the free-index terms of a particular
document.
Perhaps the one surprising experimental result which did merit consider-
ation was the comparatively good retrieval performance of strategy CRTW.
Of the other strategies CRTW was most similar in type to CTW. The fact
that a reduction in the number of search terms by more than half (C RTW
always contained 20 terms and CTW contained, on average, 46 terms) had
produced no significant difference in retrieval performance was unexpected
and rather deflating. It suggests that the number of search terms in profiles
should be optimized rather than compiled exhaustively. The decision that
strategy CRTW should contain a maximum of 20 terms for each query was
arbitrary and not much more than a reasonable assumption, and the optimum
number of terms would be expected to vary from query to query depending
on their subject matter. One explanation for the result, offered originally by
Cleverdon, is that the arithmetic product `Number of profile search terms x
Number of document index terms'[OCRerr]has a critical value which, if exceeded,
results in a deteriorating retrieval performance. This view is supported by
the results from another INSPEC project16 in which profile search terms
were expanded automatically by reference to a thesaurus as a source of,
successively, synonyms, narrower terms and see also' terms. It was found,
against expectation, that although the retrieval performances `were not
usefully different....the general trend was for poorer recall with each
expansion[OCRerr]. The important factor was that the base profile version contained
41 terms on average and the final expanded version contained 122 terms on
average, figures which were probably well above the optimum. This effect of
profile length can of course be counteracted by the use of weights and
grouping of terms but it remains an interesting point when searching by the
simplest strategy, straight co-ordination of unweighted terms.
References
I. CLAGUE, P ..[OCRerr]DI InleAtigation J967 ]969. S Vols, Report R71 6. I[OCRerr]SPEC, Institution of
Electrical Engineers. London (197!)
2. AITCHISON, T. M. et al. Contparathe Eta/nation [OCRerr]f ItnIex Languages: Part I- Design; Part 2-
Rt'snlts, Reports R70 1 and R70 2. INSPEC, Institution of Electrical Engineers, London
(1969, 1970)
3. EVANS. L. Optinium Degree yf Lser Participation in SD! Pry/iA' Generati()n, Report R73: 12,
INSPEC, Institution of Electrical Engineers, London (1 973)
4. CLEVERnoN, C. W. and HARDING, P. Report on an Iniestigation into a Meehanised !n/(>rmation
Retr&'ia/ Seri[OCRerr]i('e in a Speciah[OCRerr]ed Suhje('t Area (CRISPE Project). Cranfield Institute of
Technology (1970)
5. EVANS, L. Search Strateg[OCRerr]v i'ariati()ns in SD! Pry/i/es, Report R75 21. INSPEC, Institution of
Electrical Engineers, London (1975)
6. SPARCK JONES, K. and VAN RIJSBERGEN, C. J. Report on the Need./()r and Pmi[OCRerr]ision yI[OCRerr]an `Ideal'
Jn!orrnatit)n Retrietal Test Co//e[OCRerr]'tion, British Library Research and Development Report
5266. Computer Laboratory. University of Cambridge (1975)
7. SPARCK JONES, K. and BATES, R. G. Report on a Design Study /()r the `Ideal' Inž)rriation
Retr&'i'a/ Test (`o/A[OCRerr]'tion, British Library Research and Development Report 5428, Computer
Laboratory, University of Cambridge (1977)