IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Results 309 Some points of interest to emerge from Table 14.8 are: (1) Strategy CRTW is confirmed, without qualification, as the most cost- effective search strategy. (2) Compared with the ranking of strategies based on retrieval performance only (p.301), strategies which have strikingly changed their relative positions are CT (upwards), GWC (downwards), and BW (downwards). (3) The 4 strategies comprising basically a single list of search terms, viz. CRTW, CT, TWC and CTW, occupy the top four positions. Free- and controlled-language comparlson I I I I I I Although not envisaged as part of the original project the data that became available during the experiment was considered suitable for a direct comparison of free-language and controlled-language boolean profiles in the INSPEC environment. The data covered profile compilation times, number of search terms, and recall/precision performance figures. These are all detailed in the original report and are only summarized here. The average compilation time for the controlled-language boolean profiles (31 mm) was just less than half that for the free-language boolean profiles (65 mm). The times recorded were for a compiler who was already familiar with the controlled language concerned-INSPEC's thesaurus and unified classification scheme. These compilation times may be slightly biassed in favour of the controlled-language profiles because invariably the free- language versions were compiled first. When the controlled-language version came to be compiled there would probably be some memory of the original user statement even though the free-language version may have been compiled some weeks earlier. As well as having shorter compilation times, the controlled-language boolean profiles were smaller than the free-language boolean profiles by a factor of 2+, averaging 19 terms and 47 terms respectively. It should be pointed out that one reason for the smaller number of search terms in the controlled-language profiles is that they were used in searching in a `free4ext' way, i.e. extensive use was made of truncation in the search terms. Assuming an approximately linear relationship between the number of profile search terms and computer search time, this factor of 2+ would be largely reflected in the computer search costs in favour of the controlled-language profiles. There is a further saving of search time for controlled-language profiles because the controlled-language searchable fields are smaller than the free- language searchable field in the INSPEC database. Statistics then current indicated that the relative sizes of the three fields were in the ratios: Free indexing 10 Subject headings (thesaurus terms) 5 Unified classification codes 1 Because compilation of the controlled-language boolean profiles was started after the main experiment had got under way the quantity of experimental data for the first few SDI runs was limited. Only the recall/precision figures for the last three of the eight runs were analysed, i.e. those for runs 6, 7 and 8. Overall values for recall and precision calculated by the usual two averaging methods (average of numbers and average of ratios)