CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Formation of Index Languages chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 85 - relatively general thesaurus was being applied to a special field. Although some loss of specificity (compared with the natural language} was regarded as inevitable, a considerable extension of the vocabulary was necessary if the specificity were not to suffer seriously. This extension raised problems of maintaining consistency with the principles with which the existing vocabulary and its syndetic structure of con- nectives had been developed. To assist this, the Rules for preparing and updating Engineering Thesauri (4th draft} November 1964 were observed as far as possible. Selection of terms Generally speaking, the aim was to incorporate the extra detail as unobtrusively as possible, without disturbing the distinctive character of the E. J.C. index language. The various E.J.C. methods for keeping down the size of the vocabulary were observed where feasible'- (i) Outright rejection of highly specific terms (Rule T-I) when the sense of the term could be approximated with reasonable adequacy by a broader term. E.J.C. omitted a number of prominent aeronautical and aerospace terms which did not appear to meet this criterion (e. g. , Sonic boom, Tail, Stall, Bodies, Buffeting, Chord} and these were simply added. It also omitted a very large number of more precise terms and phrases occurring in the natural language indexing but which qualified for con- sideration under this rule. Particularly affected were those terms reflecting spatial, dimensional and temporal characteristics many of which were in adjectival form (which E.J.C. avoids}; e.g., Normal, Perpendicular, Vertical, Horizontal, Behind, Outside, Below, Nearly, Large, High, Circular, Rectangular, Octagonal, Radial, Circum- ferential, Zero, Rate, Without, Free. In some of these cases, where the notion was obviously dispensable because of its poorness as a retrievable handle; the term was omitted. Examples of this were Behind, Complete, Continuous, Degree, Direct, Coefficients, Effects, Hori- zontal, Vertical, Near, Nearly, Normal, Outer, Outside (although some of these appeared in phrases, such as Continuous loading}. Outright omission was used cautiously since it diminishes the exhaustivity of the indexing. It may be noted that the main reason for holding exhaustivity constant is its effect on recall. However, the absence of a term which is completely 'non-potent' as a retrieval handle will not affect recall except in one circumstance - the use of single term searching. Theoretically, if a question includes the term Degree or Normal and this single term is searched it might retrieve a relevant document which would otherwise not be retrieved. This possibility is removed if the term is totally obliterated from the index vocabulary. However, this situation was regarded as sufficiently remote from reality to allow it to be ignored. Strictly speaking, the only condition under which exhaustivity is affected by index language (as distinct from the personal decision of the indexer to include or not to include a notion} is when the language completely fails to provide an appropriate term even at the highest level of generality. This sometimes occurred with E. J. C. and the solution was simply to use the name of the category to which a term belonged; e. g., the term Shape was used for a whole cluster of natural language terms - Bicon- vex, Concave, Circular, Configuration, Diamond, Elliptical, Octagonal, Rectangular, etc. Or, the category term Position (location) was used for terms like Beneath, Outboard, Between. In this way; although specificity suffered, there was no lessen- ing in exhaustivity.