CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Formation of Index Languages chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 61 - used ones) accounted for 68"/0 of the indexing postings, and 30"/o accounted for 92"/o of the postings, after which the curve flattens out. Reduced vocabularies Some explanation of the problem of vocabulary reductions referred to above seems desirable. Generally speaking, all recall devices imply a smaller vocabulary (with bigger classes), and precision devices imply a larger vocabulary (with smaller classes). A class is enlarged by confounding two or more classes which previously had a separate existence; contraction is the reverse process. By 'vocabulary', we mean the total number of discrete indexing elements, lexical and syntactic (i. e. , substantives and relational terms) provided in an index language. It may seem sur- prising that links are included in a statement of vocabulary size, since they are not discrete devices in the sense that they are countable in the way lexical terms and roles are, but vary with the number of documents indexed. However, by the funda- mental criterion of whether they define particular classes which would not be dis- tinguished without them, they must be regarded as part of vocabulary size. It should be noted that vocabulary size, under normal indexing conditions, is not necessarily a determinant of the specificity possible in an index language. This is because increased specificity is always obtainable by coordination; e. g., if the vocabulary contains the terms Flow and Supersonic, class Supersonic flow is specifiable by coordinating these two terms. Theoretically it is possible to specify almost any- thing in this way; e. g. 0 Air x Cushion x Vehicle is a simple conjunction of the separate terms normally used to name this thing; but even where a name in no way defines the nature of the thing it represents, it may be specified uniquely by contrived analytical 'definition' e.g., in the W. R. U. Semantic Code,Tempering is represented by Process x Metal x Heat x (number) where the number is an arbitrary code symbol distinguishing this particular heat process on metal from any other. Perhaps the extreme example of the use of reduced vocabularies, with precise description resting on the various conjunctions of a few fundamental terms was the Malvern experiment (Ref. 25). In the case of single-term classes without coordination, however, a reduced vocabulary can be an absolute bar on the specificity possible. If no coordination is used, a single-term vocabulary of 1,500 specifies only half the classes specified by a 3,000 term vocabulary. So far as testing devices is concerned, there are two dif- ferent ways of effecting the expansion of classes. One is by an absolute reduction of vocabulary whereby the reduction is obligatory for all searches; the other is by selec- tive search programmes, whereby the effective reduction is permissive and may or may not be utilized in a particular search. In the first case the reduction is measurable (i. e., in terms of the number of discrete classes distinguishable) and in the other it is not. Obligatory reduction of vocabulary Here, there is an absolute 'block reduction' (a block of classes being condensed into one) in the number of classes recognized, and the indexer and searcher has no option but to accept the confounding of more specific classes which is implied. This was the case with reduction by synonym-control and by confounding of word forms. It was also the case with the single-term hierarchies, although reduction by hierarchy may be achieved permissively and was in fact done this way in the testing of 'concept' hierarchies. This point is explained later on.