CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
General Considerations
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
apart from a fundamental characteristic, and whatever the type of index language,
it can readily be provided with a complete list of sought terms, that is a "lead-in
vocabulary".
The second requirement for index languages is a set of index terms, while a
third requirement is a set of code terms. Before attempting to explain the differences,
it must first be said that in many index languages there will be some terms which
will occur in the triple role of a lead-in term. an index term and a code term.
Further, all index terms must be lead-in terms, and frequently the set of index
terms will be the same as the set of code terms. -For examples of the three
types of terms, the Thesaurus of the Engineers Joint Council can be considered
(Ref. 27).
A lead-in term represents a concept which is described by another term
than itself. This may represent a synonym, e.g. Speed use Velocity, or may
be a subordination of a specific term to a more general term, e.g. Hexagonal
use Shape.
Code terms are those terms which are actually used in indexing, examples
being Velocity, Rotation, Engine noise. Jet engines.
Index terms are all Code terms, and additionally any combinations of
Code terms which make up and express new concepts. For instance, the
Index term 'Peripheral speed' is expressed by the use of the two Code terms
Rotation and Velocity, while the Index term 'Jet engine noise' is expressed
by the use of the Code terms Jet engines and Engine Noise.
While these three types of terms, i.e. , lead-in terms, index terms and code
terms, are normal ingredients of an index language, most index languages also make
use of auxiliary devices or aids. In a completely simple system, lead-in terms would
always be the index terms and the code terms, which is to say that terms would be
used exactly as they appeared in the literature. As soon as the set of index terms
is fewer in number than the set of lead-in terms, then a measure of control has been
introduced. This normally takes the form of combining terms which are synonyms,
and is only the first of many devices which are used in various ways to make up
different index languages. There is nothing exclusive about such devices which res-
trict their use to any particular type of index language; ; precoordinate or post-
coordinate, alphabetical or classified, any type of index language can potentially be
given the same devices and thereby have the operational performance of any other
index language.
In his book "On retrieval system theory", (ref. 9), Vickery identified seventeen
devices, and acknowledgement must be made that in the original project proposal,
these formed the basis of our argument. Vickery lists these devices as follows.
Means of control Field of use
1. No control. Some amateur alphabetical indexes.
2. Rigid control - fixed vocabulary Some mechanized systems with limited coding
of descriptors, capacity.
3. Confounding of variant word forms. Professional alphabetical indexes, including
Uniterm, and most other systems.
4. Confounding of true synonyms. Ditto.
5. Confounding of near synonyms. Some subject heading lists, some classi-
fications, and systems based on thesauri.