IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Retrospect 313
I
t
t
maybe it is not sensible to attempt such a comparison in that like is not being
compared with like.
An often-voiced criticism of the original report was that too little attention
had been paid to the statistical significance of the experimental results. An
attempt has been made to rectify that in this paper and undoubtedly it has
helped to clarify the picture. However establishing statistical significance
when differences are not obvious is still a limited achievement and there is
much appeal in the philosophy of what might be termed a Cleverdonian
maxim[OCRerr]if one needs to resort to statistical techniques to establish
performance differences in information retrieval experiments then the
differences are not worth knowing about.
It was mentioned almost in passing that assigning weights subjectively to
search terms and term-groups was unlikely to appeal greatly to users
attempting their own profiling. For this reason alone it is a pity that automatic
weighting of terms was not possible in the evaluation. Also since all profiling
was carried out by one experienced compiler there was no impression gained
as to the likely ease-of-use, and acceptance, by end-users (as opposed to
professional information staff) of the different search strategies. This failing
was realized from the start but it was thought to be too difficult to surmount
easily. The decision to use only one compiler at least ensured the control of
this variable.
It has been thought that perhaps not enough was attempted at the time to
establish the reasons why the strategies performed as they did. The question
of how much failure analysis should be done was considered at some length.
Where strategies performed as might have been anticipated (e.g. it was not
surprising to find that the best retrieval performance was produced by
strategies using weighting techniques) there seemed little purpose in detailed
analysis. In the evaluation of an operational system it is clearly important to
obtain a measure of which activities are responsible for retrieval failures-in
particular what proportion can be allocated to poor indexing or to profile
compilation errors. In this experiment since, for each query, all strategies
shared the same basic list of search terms, such failures would be common to
all strategies. Thus the main interest was in distinguishing any differences
due to the characteristics of the strategies themselves. The most promising
procedure seemed to be to examine those queries for which the strategies,
which performed best overall, did unusually badly. Pursuing this method
showed up one clear link between search strategy performance and type of
user statement. In those strategies comprising a single list of terms (CT,
TWC, CTW) as opposed to those including term groups, there is the possible
deleterious effect of having one concept `swamping' all others in the list of
profile search terms. The damage occurs when the document free-indexing is
similarly `unbalanced'; this of course can quite legitimately be inevitable.
For example, the concept `metals' comprises more than 60 individual metal
elements and the literature is such that often a paper on some aspect of metal
behaviour deals with a number of different metals all of which are properly
included among the index terms. The result is that the outputs from single-
list strategies are top-heavy with the individual `metal' terms which, in the
term-group strategies, are controlled by virtue of being in a term group which
contributes only once to the total weight irrespective of the number of