CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-41 -
class prescribed in Q. 93 can be adjusted in order to coincide at some points with
the index descriptions of the documents. The class 'rarefied, partially ionized gas'
must be seen to correspond or relate, after suitable manipulation, to 'gases at small
pressure' in the one case and 'ionosphere' in the other. The class 'body' must be
seen to relate at some point to the class 'satellite'.
So a subject index must provide facilities for adjusting and manipulating its
classes; it must allow the index classes examined to be expanded or contracted,
and in different directions, until a match with the search prescription is recognized.
Index language devices are the agents of this manipulation. They are devices whereby
class definitions may be adjusted to meet the requirements of different searches.
Index language devices
The index description of a document is a condensed (usually a highly condensed}
statement of the document's subject content; it seeks to convey succinctly what the
document is about. Its main, and sometimes only, constituent is the set of substantive
terms (lexical elements} which act as clues to the subject of the document. These
terms may be supplemented by some indication of the relations between them (syn-
tactical elements}, e.g., by the addition of roles, or facet indicators (explicit or
implicit} or by such elementary syntactical devices as those of the Alphabetical
subject catalogue. In a post coordinate index they are usually kept to a minimum,
enough to remove serious ambiguity but no more.
It seems reasonable, then, to assume, as the simplest possible form of index
description, a bare list of words, selected directly from the title and text of a docu-
ment as being good clues to its content, and presented without any reference what-
soever to a control list for synonyms, related terms, etc.
The simplest way in which such a list of words could be used would be to regard
each word as defining one of the classes to which the document belonged, without
reference to the other words. Searches would then be made simply within these
classes, separately. For example a document indexed as being about Wakes -
Satellites - Traversing - Ionosphere would be seen simply as a member of four dif-
ferent classes (the class 'Documents dealing with Wakes', the class 'Documents dealing
with Satellites', and so on}. So a search on Satellite wakes would be made simply
by examining all documents in the class Satellites, and all documents in the class
Wakes. This is very similar, of course, to what a Permuted Title or KWIC index
does. Recall performance figures for this crudest of all forms of index language were
assessed in the first Cranfield project as 97%. Precision figures were not available
but it is certain that they were very low. It is assumed that all the keywords constitu-
ting the question are examined. If a selection were made, r[OCRerr]call would probably
drop in so far as the exhaustivity of the searching would have dropped. The question
of exhaustivity and specificity of searching and indexing is discussed later.
Now will be considered the ways in which, by the use of various devices, this
simplest of all possible forms of indexing can be refined in order to increase its
capabilities for meeting all the demands which search prescriptions may make on
it. Such devices may be separated conveniently into two groups;
I. recall devices - those which, when applied to any existing class, increase the
size of the class in terms of the documents responding to the definition; e. g., if
the class Bakelite is expanded by hierarchical linkage to include all Phenolic resins,
more documents are retrieved.