CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
42 -
2. precision devices - those which, applied to any existing class, decrease the
size; e.g., if we coordinate Bakelite with Extrusion and examine only the class
defined by this simple relationship of intersection, we exclude those documents on
Bakelite which do not refer specifically to this operation.
The operation of these devices on the sort of simple index description described
above can be seen by a consideration of the relations such a description displays
to the precise subject of the document concerned, and to the wider subject field of
the information store from which we may wish to retrieve that document or some-
thing like it.
An index description a, b, c, d, e, f (where each letter represents a substantive
term or lexical element, e.g., Wing, Drag, Control) embodies two sets of relations:
firstly, those internal to it, reflecting the local and temporary conditions peculiar
to the subject of the document described; e.g., the fact that dis the product, whereas
f is an agent of the process a which produces it, or, the fact that b qualifies a while
c qualifies d, but that neither of these qualifiers is applicable to the object of the
other; or, more subjectively, that a_ and b, rather than c d e or f, represent the domi-
nant theme of the document. These are, broadly speaking, the interlocking relations
between the substantive terms.
The second set of relations are those external to it, reflecting the more perman-
ent pattern of relations in the wider field or subject area to which the document
belongs: e.g. , that a is a species of x, or that c is almost synonymous with q_, or
that a represents one participle of a term (e. g.[OCRerr] Cooling) which may be usefully
related to another participle (e. g. Cooled) in the subject concerned.
The two sets of relations can be utilized to add precision to the original des-
cription (using the first set) or to expand the description by reference to the wider
relations (using the second set}. In other words, they underlie the two groups of
index language devices which will now be outlined briefly.
Devices which increase precision
(i) Coordination - i. e., the conjunction of two or more terms to produce a nar-
rower class defined by the intersection; e.g. Shear and Flow to give Shear flow.
This is the most important device in indexing. Whilst it is commonly associated
with postcoordinate systems where it is implemented mainly if not entirely at the
search stage, it is equally fundamental to precoordinate systems; but in these, only
the products of selected coordinations are usually catered for conveniently.
(ii) Weighting - i. e., the assignment to a term of a figure representing the relative
significance of that term in the total subject description of the document. So a term
which represents the central theme of the document gets a high weighting and one
which represents only a marginal element in the subject content of the document
gets a low weighting. If now a question is also weighted, i.e., greater significance
attaches to one or some of its terms than to others, then the search may be directed
only to coordinations with that term or only to the same term when it has been given
a similarly high value in indexing. In either case, the class of documents retrieved
is made narrower.
(iii) Links - i. e., indicating a particular connection between two or more terms
in a description where the lack of such an indication would produce ambiguity; e.g.,
if the same document deals with the hardness of copper and conductivity of titanium
a link between Hardness and Copper on the one hand and between Conductivity and