CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Formation of Index Languages
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 80 -
degree of precoordit[OCRerr]ation. It has already been shown how the cluster of concepts
around a given terra {which might also be a root term for a number of word forms}
such as Heat or Interference reflects a variety of relations; e. g., Interference, Blockage;
Interference, Forebody; Interference, Jet; these all reflect kinds of interference
according to source. Interference filters reflects Interference as an experimental
agent {in temperature measurement): Interference load reflects Interference as a
source of another phenomenon. When these assorted relations are added to a certain
degree of word-form confounding, (e. g. expanding an initial enquiry for Dissociation
by the addition of classes like Dissociated stream or' Dissociating fraction} the result
is an eclectic recall device which utilizes elements of hierarchy, non-generic hierarchy,
confounding of word-forms, and linking {an element of precoordination is essential to
the programme). Such a mixture cannot, however, rank as a 'device' in the way this
notion was understood in Chapter 4. It is further considered in the next section.
Formation of Classes by Search Programmes
A significant feature of hierarchical linkage as an indexing device is the rich
variety of relations it displays, enabling a number of different paths to be pursued in
adjusting the size and content of the class or classes with which a search begins.
Some of these paths were briefly mentioned in the last section, using the example of
Visualization tests.
In exploiting these relations two different policies can be followed; either classes
are expanded by bringing in all the terms related in a particular way - e. g. , all the
terms subordinate to the original one, as when all the different kinds of compressors
are added to a search for Compressors. Or, classes are expanded eclectically,
choosing just those members of a given relationship which seem most likely to be
relevant in the context of the whole question. The latter policy is the one normally
followed in the conventional classified index.
The former policy has the merit of simplicity in programming {once the schedules
are established) and this is clearly pertinent in the case of machine searching and is,
in fact, generally implied by the term 'generic search'. Equally obvious is the fact
that it will tend to result in a lower precision ratio than a selective search, but pos-
sibly also a higher recall ratio.
In the testing of the concept hierarchies it was decided to attempt both approaches
and the following different searches were programmed, each one producing a differently
defined class.
(1) The simple natural language concept alone
(2) Confounding of s[OCRerr]znonyms. It has already been pointed out that a classification
should automatically throw up synonyms as a result of its analysis; also, that a num-
ber of synonyms only become apparent at the level of concepts. Both these factors
operated to produce a programme for synonyms quite different from that using single
terms alone, Examples are: Temperature distribution + Temperature profiles + Tem-
perature history; Angle of incidence + Angle of attack + Arbitrary angle of attack +
Incidence; Initial expansion region + Prandtl-Meyer region.
(3/8) From this point onwards, the classes formed by (2) were regarded as the basic
classes to be expanded. This expansion was achieved by adding further classes to (2)
on the basis of the following programmes: