IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
290 An experiment: search strategy variations in SDI profiles
(10) Co-ordinate matching of restricted list of terms with weights (CRTW)
The original list of search terms chosen to represent a particular user's
interests in strategies (l)-(9) above was not restricted in any way and
comprised, on average, more than 40 terms. It had been felt, mainly as
a result of experience with INSPEC's commercial SDI service where
similar-sized profiles pertained, that there may be a significant number
of unnecessary search terms in profiles, viz. those search terms which
`hit' too infrequently to be useful and those search terms which `hit' too
often to be selective.
A further argument was that although the search profiles used in online
retrospective searching contain, in general, far fewer terms than those
in current-awareness batch SDI systems, their retrieval performance
does not seem to be noticeably inferior.
For these reasons it was thought worthwhile to include for comparison
a profile version in which the number of terms was restricted to 20
irrespective of the number in the original list used for search strategies
(l)-(9) above. The 20 terms were selected subjectively in order of
importance from those used in strategies (l)-(9).
(11) Controlled-language boolean strategy (CLB)
In the main experiment the medium used for matching profiles and
documents was natural language, viz. the free-index terms assigned to
all items added to the INSPEC database. In general the free-index
terms are words or phrases occurring in the original document which
represent the meaningful concepts treated in the document. In addition
to the free indexing, the subject content of all items in the INSPEC
database is indicated by two other elements: (i) classification codes,
which govern the location of the item in the published abstracts journals,
and (ii) controlled subject headings, which appear in the six-monthly
indexes to the abstracts and are used mainly for manual retrospective
searching. The classification codes and controlled subject headings can
of course also be used in machine searching and to this end boolean-type
profiles using only classification codes and/or subject headings were
prepared.
Originally the main purpose of these controlled-language boolean
strategies had been to act as `back-up' profiles to retrieve relevant
documents which the other versions (based on the free-index terms)
might miss because of inadequate free indexing or profiling. Knowledge
of these additional relevant items would of course mean that the recall
performance figures would be that much nearer to being measures of the
true rather than the relative recall. However the data available also
allowed a direct comparison of the retrieval performance of controlled-
language against free-language boolean profiles. Brief details of this
secondary experiment are given in section 14.3 (p.309).
Profile compilation
All the tasks associated with translating the original user statements into
profiles incorporating the various search strategies were carried out by one
person under controlled conditions. These are described now.