IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 290 An experiment: search strategy variations in SDI profiles (10) Co-ordinate matching of restricted list of terms with weights (CRTW) The original list of search terms chosen to represent a particular user's interests in strategies (l)-(9) above was not restricted in any way and comprised, on average, more than 40 terms. It had been felt, mainly as a result of experience with INSPEC's commercial SDI service where similar-sized profiles pertained, that there may be a significant number of unnecessary search terms in profiles, viz. those search terms which `hit' too infrequently to be useful and those search terms which `hit' too often to be selective. A further argument was that although the search profiles used in online retrospective searching contain, in general, far fewer terms than those in current-awareness batch SDI systems, their retrieval performance does not seem to be noticeably inferior. For these reasons it was thought worthwhile to include for comparison a profile version in which the number of terms was restricted to 20 irrespective of the number in the original list used for search strategies (l)-(9) above. The 20 terms were selected subjectively in order of importance from those used in strategies (l)-(9). (11) Controlled-language boolean strategy (CLB) In the main experiment the medium used for matching profiles and documents was natural language, viz. the free-index terms assigned to all items added to the INSPEC database. In general the free-index terms are words or phrases occurring in the original document which represent the meaningful concepts treated in the document. In addition to the free indexing, the subject content of all items in the INSPEC database is indicated by two other elements: (i) classification codes, which govern the location of the item in the published abstracts journals, and (ii) controlled subject headings, which appear in the six-monthly indexes to the abstracts and are used mainly for manual retrospective searching. The classification codes and controlled subject headings can of course also be used in machine searching and to this end boolean-type profiles using only classification codes and/or subject headings were prepared. Originally the main purpose of these controlled-language boolean strategies had been to act as `back-up' profiles to retrieve relevant documents which the other versions (based on the free-index terms) might miss because of inadequate free indexing or profiling. Knowledge of these additional relevant items would of course mean that the recall performance figures would be that much nearer to being measures of the true rather than the relative recall. However the data available also allowed a direct comparison of the retrieval performance of controlled- language against free-language boolean profiles. Brief details of this secondary experiment are given in section 14.3 (p.309). Profile compilation All the tasks associated with translating the original user statements into profiles incorporating the various search strategies were carried out by one person under controlled conditions. These are described now.