CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 56 -
The fifth step was to assign roles to the terms. At this stage a temporary failure
of plan has to be recorded. If roles were to be assigned, it was obviously desirable
to assign them at the time of indexing. Preliminary analyses were made of a number
of complete indexing descriptions in order to establish the major variations of role
occurring amongst the terms and the degree to which this might lead to ambiguity.
It was stated earlier that the main function of roles would seem to be the removal
of a certain type of ambiguity beyond the power of simple Links to remove. Links, it
should be remembered, simply assert that some relation exists between two or more
terms - between a and b, say, rather than a and c. A role states what the relation is.
From the preliminary analyses, it became apparent that the roles developed
by Western Reserve University, American Institute of Chemical Engineers and by
DuPont were not appropriate, designed as they were for subject areas {chemistry
and applied chemistry} in which the appearance of the same term (e. g. a material)
in significantly different roles was a fairly frequent experience. The Cranfield inves-
tigation of the W. R.U. index had already shown some of the drawbacks associated
with roles, even in those fields for which they seemed particularly appropriate, and
our analyses confirmed these. A major problem was the fact that when a term plays
one particular role in an indexed document, it does not necessarily make that docu-
ment irrelevant to a question in which the same term features in a different role:
for example, a document on the 'Use of mufflers to control the sound of jets' suggests
the roles: Muffler {Agent of operation}, Control {Operation), Sound (Product), Jet
(Cause}. If a question were now asked on the 'Effect of mufflers on jets' the role of
Muffler might well be designated as Influencing factor {cause} and that of Jet as Thing
affected; the relevance of the document to this question would be obscured by the
different roles assigned to the otherwise matching key terms. Such a situation im-
plies that several quite different roles might be acceptable in searching; but this comes
dangerously near to negating the whole idea of roles. Such, in fact, was the situation
in the W. R.U. test, when the roles were frequently ignored in W. R.U. search pro-
grammes, presumably because of awareness of the danger of demanding too close
a match.
An alternative plan which seemed to be suggested by the kind of example given
above was that roles should not be assigned purely according to the relations between
the terms in the document or question concerned {their syntagmatic relations), but
also according to a wider picture of the relations between the terms in a subject area
(their paradigmatic relations). It is generally recognized that the organization of
terms into facets is closely related to the provision of role indicators. In special
faceted classifications it is not uncommon to find a term appearing in more than one
facet, the difference being due to the difference in role played; e. g., in a classi-
fication for pharmaceuticals, the same substance might appear as a Product, a
Substance Extracted, an Agent of a reaction, or as an Agent of an operation. In such
a system, where prior analysis of the terms of vocabulary has been undertaken, the
more enduring relations (that a problem or by-product in the propagation of jets is
the production of noise, which demands control, and that mufflers are one agent of
control) would be recognized and the fortuitous alteration of roles suggested in the
question would not have been allowed to obscure the situation.
Consequently, a set of roles was developed along these lines so that they closely
reflected the categories which would be distinguished in facet analysis of the field.
But trials of these (i. e. , examination of a number of indexing descriptions in order
to see whether ambiguities would be removed by the roles) showed that they left un-
touched what was probably the commonest problem of ambiguity in the vocabulary