CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-53 -
with other terms, to produce meaningful class terms - e. g. , High subsonic speeds,
Mach number, Pitching moment coefficient, Main wing, Trailing edge, Low angle
of attack, High aspect ratio. Not only ,weak' terms were combined, however, for
the interfixing function of concepts often produces phrases whose constituent terms
are quite potent, index-wise even in isolation. For example, Cruciform wing, Cir-
cular body, Tail fins, Span loading theory, Force divergence Mach number, Wing
vortex field, Rectangular wing surface, Wind tunnel wall. Such combinations make
it clear, for example, that in the one document Surface relates to Rectangular wing,
not Cruciform wing; that Mach number relates to Force divergence rather than to
some other phenomenon; that Low relates to Angle of attack and High to Aspect ratio
{and not vice-versa}. Or, as in Document 1590 that Distribution relates to Velocity
and Ratio relates to Total Pressure, and not vice-versa.
Secondly, the concepts were now grouped into a second-stage link device in
order to display the distinct 'themes' into which the document could be partitioned.
,Partitioning' of a document is a well-established procedure in traditional precoor-
dinate indexing and is often referred to as analytical cataloguing . Owing to the
exigencies of space in the precoordinate index, such analysis is usually confined to
items in which the constituent chapters, sections, etc. can stand alone; examples
are symposia of various kinds, festschriften, and collections of plays. In such cases,
,standing alone' could be interpreted almost literally in the sense that each theme is
dealt with in a distinct, self-contained physical section of the document. In such
circumstances, partitioning could allow greater recall {resulting from greater ex-
haustivity of indexing} with almost no loss of precision. This was rarely the case in
the aerodynamics reports constituting the test collection. In these, a particular con-
cept might run as a thread throughout the document, appearing at different times in
different contexts. So themes were not necessarily, or usually, mutually exclusive.
This diffusion of various concepts throughout a document seems to be an important
cause of many problems in retrieval and particularly that of the inverse relation
between recall and precision; for a document whose index description contains all or
most of the terms of the question prescription may yet feature those terms in an un-
acceptable pattern.
The example in Fig. 4.1 (Doc. 1590) demonstrates the salient features of the two
stages of linking embodied in the indexing. To economise in the writing down of
themes, the concepts were labelled with lower case letters and only these appeared
in the themes. Where the relationship between concepts appeared to be potentially
ambiguous, it was indicated in an elementary fashion by verbal quasi-role devices
such as 'effect of'. 'by means of' or 'use of'. So the first therpe of document 1590
is to be read as Axial flow compressor - Stage performance- Effect of Stage charac-
teristics - Use of Test Data - Analysis.
Normal practice was to give as the first theme (or first few themes where neces-
sary), the general subject of the document considered as an integrated whole. Sub-
sequent themes would then bring out the particular subjects which made up the whole.
In document 1590, for example, themes A and 13 jointly represent a formal statement
of the title, with the addition of Test Data and Analysis. It also demonstrates a fairly
common situation whereby the title provides a reliable and succinct statement of the
documen@s general theme.
The third step in indexing was to give weights to the concepts {and subsequently
to the terms) - i.e., to allocate to each concept a value indicative of its relative im-
portance in the document. Such a value can be regarded as an assessment of the
probability that, should the concept concerned happen to be the subject of a question,