IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Experiment 289
(4) Co-ordinate matching of groups of terms without weights (CG)
The profile search terms are divided into groups representing the various
concepts in the query. The output is ranked, first in order of group Co-
ordination level (i.e. number of matching terms where each is from a
different profile group), and then in order of total number of matching
terms.
(5) Group-weight cumulation (GWC)
The profile term-groups of (4) are weighted according to the relative
importance of the groups (concepts) to the query. The weights of all
matching groups are summed to produce a document score. The output
is ranked in order of document scores.
(6) Group-/term-weight cumulation (OTWC)
The term-groups of (4) are weighted according to their importance to the
query and the individual terms are weighted according to their
importance within their group. The output is ranked, first by sum of
matching-group weights, second by sum of highest-weighted matching
terms from each group, and third by sum of all matching-term weights.
(7) Co-ordinate matching ofgroups of terms with weights (COW)
The profile term-groups of (4) and the individual terms are weighted. The
output is ranked, first in order of group co-ordination level, second in
order of sum of matching-group weights, and third in order of sum of
matching-term weights.
(8) Boolean logic (B)
The profile term-groups of (4) are governed by boolean statements which
must be satisfied before any output is obtained. The output is in document
number order, i.e. unranked.
(9) Boolean logic with weights (BW)
The profile term-groups of (4) are governed by boolean statements
which must be satisfied before any output is obtained. After the
boolean equations are satisfied the ranking of output may be based on
group- and/or term-weight cumulation procedures. In our experiment
only term weights were used.
Basically procedures (1), (2) and (3) involve search profiles comprising a
single list of terms (weighted or unweighted) while procedures (4[OCRerr](9)
inclusive involve profiles comprising groups of terms (in which groups and/or
terms may be weighted or unweighted).
In the weighted profile versions two types of weights were used. In
procedures (2), (3), (5) and (9) above, the weights were subjectively assigned
by the compiler, while in procedures (6) and (7) `powers of 2' weighting was
used9. In `powers of 2' weighting the weights are assigned routinely once the
order of importance of individual terms and/or term groups in the search
profile has been intellectually decided. This ordering was again decided by
the compiler.
Profiles incorporating automatically-assigned weights were not included
in the study mainly because the necessary statistics (term frequencies, etc.)
were not immediately available. They were to become available subsequently
from another INSPEC research project.
In addition to the 9 search strategies listed above, as the project proceeded
it was decided to include a further two types: