IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Retrospect 313 I t t maybe it is not sensible to attempt such a comparison in that like is not being compared with like. An often-voiced criticism of the original report was that too little attention had been paid to the statistical significance of the experimental results. An attempt has been made to rectify that in this paper and undoubtedly it has helped to clarify the picture. However establishing statistical significance when differences are not obvious is still a limited achievement and there is much appeal in the philosophy of what might be termed a Cleverdonian maxim[OCRerr]if one needs to resort to statistical techniques to establish performance differences in information retrieval experiments then the differences are not worth knowing about. It was mentioned almost in passing that assigning weights subjectively to search terms and term-groups was unlikely to appeal greatly to users attempting their own profiling. For this reason alone it is a pity that automatic weighting of terms was not possible in the evaluation. Also since all profiling was carried out by one experienced compiler there was no impression gained as to the likely ease-of-use, and acceptance, by end-users (as opposed to professional information staff) of the different search strategies. This failing was realized from the start but it was thought to be too difficult to surmount easily. The decision to use only one compiler at least ensured the control of this variable. It has been thought that perhaps not enough was attempted at the time to establish the reasons why the strategies performed as they did. The question of how much failure analysis should be done was considered at some length. Where strategies performed as might have been anticipated (e.g. it was not surprising to find that the best retrieval performance was produced by strategies using weighting techniques) there seemed little purpose in detailed analysis. In the evaluation of an operational system it is clearly important to obtain a measure of which activities are responsible for retrieval failures-in particular what proportion can be allocated to poor indexing or to profile compilation errors. In this experiment since, for each query, all strategies shared the same basic list of search terms, such failures would be common to all strategies. Thus the main interest was in distinguishing any differences due to the characteristics of the strategies themselves. The most promising procedure seemed to be to examine those queries for which the strategies, which performed best overall, did unusually badly. Pursuing this method showed up one clear link between search strategy performance and type of user statement. In those strategies comprising a single list of terms (CT, TWC, CTW) as opposed to those including term groups, there is the possible deleterious effect of having one concept `swamping' all others in the list of profile search terms. The damage occurs when the document free-indexing is similarly `unbalanced'; this of course can quite legitimately be inevitable. For example, the concept `metals' comprises more than 60 individual metal elements and the literature is such that often a paper on some aspect of metal behaviour deals with a number of different metals all of which are properly included among the index terms. The result is that the outputs from single- list strategies are top-heavy with the individual `metal' terms which, in the term-group strategies, are controlled by virtue of being in a term group which contributes only once to the total weight irrespective of the number of