CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-49 -
An exhaustive and specific description {either in indexing or in question formulation)
is one which allows no further qualification or refinement - it is a completely precise
statement of the class concerned (or classes, if the terms are considered individually).
it seems clear that such a description should include not only the substantive terms
or lexical elements (which are the essential and often sole constituents of most index
descriptions, at least for post coordinate systems) but also the full range of inter-
locking relations, or syntactical elements, which convey the exact relations between
these terms in that particular description. To take a rather far-fetched example,
another report might refer to a high wing at subsonic speeds and unless, in the first
example, High is interfixed or linked with Subsonic speed the two different subjects
are not clearly distinguished. Unless we are to recognize these syntactic elements
as a third parameter in the precise description of a document, they must be regarded
as elements in exhaustivity and/or specificity. In the great majority of cases they
do not refer to the generic level of substantive terms but reflect non-generic relations;
they constitute 'relational' terms, analogous to the substantives. They are used as
such in a few indexing systems and theoretically at least can themselves display
varying generic levels; e. g. , Influence could be replaced by the more specific Harm-
ful Influence. It would seem, then, that these terms reflect both exhaustivity and
specificity, but more often the former. Only when the relation is explicitly a generic
one (as can be the case, for example, with Farradane's appurtenance operator[OCRerr]Ref. 23)
can they be said to determine specificity.
Exhaustivity of indexing
Recall devices cannot be fully tested unless the indexing on which they are tested
is exhaustive; otherwise, loss of recall at any point might be attributable to a lack
of exhaustivity rather than to the device concerned. So maximum exhaustivity in in-
dexing was attempted, at least as far as substantive terms (lexical elements) were
concerned. At the same time, since it was clearly desirable to measure the effect
of varying exhaustivity on different devices, it was necessary to note during the index-
ing which terms would in fact have been omitted if any level less than complete exhaus-
tivity had been acceptable.
This problem was very conveniently solved by using the figures assigned to terms
as a weighting device as indicators also of which particular terms would have been
accepted at different levels of indexing exhaustivity. For the highest level of exhaus-
tivity all terms would be acceptable, whatever their weight. For the lowest level of
exhaustivity only those terms given the highest weight (i. e., those terms which would be
regarded as essential even in relatively superficial indexing) would be acceptable.
In the result, on average, 31 terms were used per document, with 3 levels of
weighting. (6, 8, 10). If only those terms weighted 8 or 10 were counted, it repre-
sented an exhaustivity level of 25 terms per document and if only those weighted 10
were counted it represented 13 terms per document.
Of course, 'complete exhaustivity' is a relative term here; strictly speaking only
the use of the full text of the document including diagrams, tables and graphs, con-
stitutes completely exhaustive indexing. But whilst the economics of mechanised
aids in indexing may eventually make this feasible and its testing desirable, the
degree of exhaustivity represented by 31 terms per document was thought to be a
reasonable approximation to what would be regarded, for documents in this particular
subject area, as extremely thorough indexing. The problem of syntactic elements
{'relational terms') as an element in exhaustivity will be dealt with later.