CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
50 -
Specificity of indexing
Precision devices cannot be fully tested unless the indexing language on which
they are tested is of maximum specificity. For example, a question on Elliptical
cylinders can only be matched specifically if, whenever that concept appears during
indexing it is represented by the exact description Elliptical cylinders and not by
a more general term such as Cylinders alone. If the indexing is not specific in the
first place, there is nothing the searcher can do to improve precision by altering
his search programme.
At the level of substantives, or lexical elements, specificity was fairly easy
to achieve, since, by adhering closely to the language of the document and indexing
exhaustively, it was reasonably certain that the specific subject of a theme or concept
would be brought out. Even if the author used a more general term in the title or
summary, as was often the case, the specific term would nearly always appear some-
where in the text. For example, the title might refer to a 'Laminar boundary layer',
the summary to an ,Incompressible boundary layer' and the body of the text to a
'Steady, laminar, incompressible boundary layer'; the indexing would give Steady,
laminar, incompressible boundary layer.
Effect on indexing procedures of methods of measurement
Having provided for the control of the major parameters of exhaustivity and
specificity, the problem arose of how the different devices might be added, one by
one, to the basic natural index language, so as to allow for their measurement.
Several possible methods of proceeding now presented themselves: :
(1} To make one index completely devoid of any devices, and concurrently, to make
a number of separate indexes, each one embodying this first index modified by a single
device, e.g. one index in which the varying word forms of a term were confounded,
another in which hierarchical linkages were established, etc.
(2) To make a device-less index and measure the impact of devices entirely by vari-
ations in search programming; e.g., the result of confounding synonyms could be
measured simply by programming a search for 'Disturbance' as 'Disturbance + Per-
turbation' whereby the expansion of a class is achieved simply by making a sum of
the constituent parts of the expanded class. Similarly, measurement of the effect of
confounding word forms could be effected by programmes such as 'Injecting + Injection +
Injector . . .'
In comparison with (I), this method, obviously, would be much less laborious
clerically, even in the case of hierarchical linkage. It might be thought that this
could be best measured by constructing a classified index: but the latter is an amalgam
of several devices and the measurement of strict hierarchical linkage in isolation is
measurable quite effectively by such programmes as: Wave + [(Wave x (N + Standing +
Blast + Shock)] to expand an initial class like 'N Wave' to the generic containing class
' Wave'.
Of course, such search programmes required the compilation of code dictionaries -
of synonyms, of word-forms, of hierarchies (i. e. of classification schedules). But
the indexing itself, so far as these recall devices were concerned, could be done
without any regard to these devices whatsoever, since the relations concerned did