CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Formation of Index Languages chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -71 - In the same way, there were numerous examples of terms which appeared to represent operations or processes (if one regarded only the single terms in isola- tion) but which represented an integral part of the specification of a particular kind of thing; e.g. Settling chamber, Drivin[OCRerr] gas, Non-lifting wing, Geared elevator. Wherever such a term had appeared only in that particular context and its function as a class determinant had been to characterize the entity and not the operation, property, etc., as such, it was subordinated in the hierarchy to the entity which it specified. The exact status of these variants on insertion into the hierarchies created a slight, theoretical problem. The con{ounding of synonyms in an earlier programme had already established what terms were exactly synonymous and it would have been inconsistent now to add these variants as synonyms (the weakness of a synonym pro- gramme derived before the establishment of a classification has already been noted}. So they were simply clustered together as though coordinate in relation to each other. Had the measurements of single-term hierarchical linkage taken the same form as in the later 'concept hierarchies', whereby various hierarchical trails were followed in order to distinguish sharply between different relations (subordinate, superordinate, coordinate, etc. ): this might have produced a very slight distortion of the performance figures. However, the measurement of single-term hierarchies only took the form of block-reductions in vocabulary size (in the manner discussed earlier in this chaper), so no harm was done. It must be admitted that a few errors crept in, when unjustified violence was done to a category by the subordination of one of its members to another category. For example, in the overwhelming majority of cases, the term Revolution occurred in indexing as part of[OCRerr]ody of Revolution'; so, according to the reasoning above, it was located in the category of Shape, since its function was to designate a particular kind of shape. However, its synonym, Rotation, occurred once or twice in its funda- mental guise of a process; it is therefore misplaced under Shape. It is not thought that these occasional lapses were serious. We have already seen that in making single term hierarchies, if a term is relegated to a fundamental category this results in classes sometimes being drawn in which are unhelpfully associated; this is also what happens in the case of a lapse like the above. Construction of single term hierarchies Having settled on the various solutions to the problems described above, the formidable task of organizing the 3094 terms of the natural language proceeded. The basic operation was one of facet analysis (a facet being a hierarchy). A useful frame- work for the initial sorting was the Facet Classification compiled for the first Aslib- Cranfield Project by J. Farradane and B. C. Vickery, althoughhigh speed aerodynamics (the subject of this test collection} tended to concentrate itself in only a few of the areas covered by the scheme, and was in far greater detail than had been handled before. Particularly large categories were those relating to Bodies, to Shapes, and various Spatial and general relations, to Fluid dynamics proper, with particular clusters of detail under such topics as Compressors, Upper atmosphere studies, and Astronautics. The speed with which the last subject has developed in recent years was reflected in the fact that whereas the Facet Classification barely mentioned it, in this test collection it was a major theme. Because no attempt was made to establish 'fundamental' categories as such, the common categories which were formed tended to be residual ones in that they contained only those terms which had not found a place in a more limited context.