CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 52 -
not depend on the context of the individual report being indexed but on the context of
the index language as a whole; in other words, the relations were paradigmatic rather
than syntagmatic. It may be noted that all the devices measurable by Method (2) are
recall devices - i. e. , devices for elaborating or expanding the classes given in the
index description of a document.
(3) To incorporate particular devices in the original indexing but in such a way as to
mal;e [OCRerr].hem detachable when required; e.g. , to attach weights to terms which could
be cG[OCRerr]mted when figures for weighted indexing were required but ignored for unweighted
u[OCRerr]dexing. This method, also much less laborious than Method (1), would be particularly
appropriate for those precision devices which depended on the context of the individual
report being indexed, and which could not therefore be measured by Method (2).
It was finally decided that the indexing proper should be done on the basis of
Method (3); that is to say, the indexing would be basically post coordinate, and take
into account only the precision devices of weighting, links and roles (whilst observing
a high degree of exhaustivity and specificity). Method (2) was to be used in the
measurement of the recall devices of Synonyms, Word-forms and Hierarchical linkage
(generic and non-generic). Associative indexing could not be measured within the
conditions above, but it was hoped that the indexing would be sufficiently exhaustive
to allow some tests of associative techniques to be made by other investigators.
Te Nuyl's device was also ignored at this point, since it was clear that our indexing
language could always be translated into dictionary-based clusters when necessary
for measurement by Method (2). Bibliographical coupling, since its classes are not
defined by subject description[OCRerr] required quite separate measurement and is discussed
in Chapter 7.
The major precision device of coordination is, in a post coordinate system, purely
a search device and its measurement does not .fit exactly into either Method (2) or
(3). It is perhaps necessary to mention at this point that post coordination in itself
does not constitute an indexing device. It is essentially a method of recording subject
descriptions in a physical form which allows equally free access to whatever combina-
tions of terms are requested. A precoordinate system, on the other hand, allows
direct access only to certain selected combinations of terms. Other combinations
constitute distributed relatives and access to them is to this extent made less con-
venient (although it is by no means forbidden, or made impossible, as is sometimes
suggested). But the class defined by coordinating two or more terms is exactly the
same, whether the operation is performed at the indexing stage (precoordination) or
at the search stage ( post coordination). The relative convenience with which access
to such a class is gained was not something with which this investigation was con-
cerned.
The form in which the indexing was recorded is best shown by Fig. 4.1 which
shows the index sheet for ˘locament 1590. Author and title details were printed on
the sheet. The indexer then analysed the document in four stages, firstly, 'concepts,
were distinguished as a "
first-stage mterfixing device ('link,); these are not easily
defined (and this must be recognized as a theoretical weakness) but their practical
function was reasonably clear. This was to remove the first level of vagueness and
ambiguity inherent in words taken singly, by not accepting adjectival forms alone
but only in conjunction with the terms they qualified. So terms which in isolation are
weak and virtually useless as retrieval handles were given the necessary context;
such terms as High, Number, Coefficient, Main, Trailing, Angle, Aspect which in
practice do not form classes for which requests are made, appeared in conjunction