CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Formation of Index Languages chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 80 - degree of precoordit[OCRerr]ation. It has already been shown how the cluster of concepts around a given terra {which might also be a root term for a number of word forms} such as Heat or Interference reflects a variety of relations; e. g., Interference, Blockage; Interference, Forebody; Interference, Jet; these all reflect kinds of interference according to source. Interference filters reflects Interference as an experimental agent {in temperature measurement): Interference load reflects Interference as a source of another phenomenon. When these assorted relations are added to a certain degree of word-form confounding, (e. g. expanding an initial enquiry for Dissociation by the addition of classes like Dissociated stream or' Dissociating fraction} the result is an eclectic recall device which utilizes elements of hierarchy, non-generic hierarchy, confounding of word-forms, and linking {an element of precoordination is essential to the programme). Such a mixture cannot, however, rank as a 'device' in the way this notion was understood in Chapter 4. It is further considered in the next section. Formation of Classes by Search Programmes A significant feature of hierarchical linkage as an indexing device is the rich variety of relations it displays, enabling a number of different paths to be pursued in adjusting the size and content of the class or classes with which a search begins. Some of these paths were briefly mentioned in the last section, using the example of Visualization tests. In exploiting these relations two different policies can be followed; either classes are expanded by bringing in all the terms related in a particular way - e. g. , all the terms subordinate to the original one, as when all the different kinds of compressors are added to a search for Compressors. Or, classes are expanded eclectically, choosing just those members of a given relationship which seem most likely to be relevant in the context of the whole question. The latter policy is the one normally followed in the conventional classified index. The former policy has the merit of simplicity in programming {once the schedules are established) and this is clearly pertinent in the case of machine searching and is, in fact, generally implied by the term 'generic search'. Equally obvious is the fact that it will tend to result in a lower precision ratio than a selective search, but pos- sibly also a higher recall ratio. In the testing of the concept hierarchies it was decided to attempt both approaches and the following different searches were programmed, each one producing a differently defined class. (1) The simple natural language concept alone (2) Confounding of s[OCRerr]znonyms. It has already been pointed out that a classification should automatically throw up synonyms as a result of its analysis; also, that a num- ber of synonyms only become apparent at the level of concepts. Both these factors operated to produce a programme for synonyms quite different from that using single terms alone, Examples are: Temperature distribution + Temperature profiles + Tem- perature history; Angle of incidence + Angle of attack + Arbitrary angle of attack + Incidence; Initial expansion region + Prandtl-Meyer region. (3/8) From this point onwards, the classes formed by (2) were regarded as the basic classes to be expanded. This expansion was achieved by adding further classes to (2) on the basis of the following programmes: