CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Indexing Procedures chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 46 - is the fact that they are rarely, if ever, used alone; whilst all the devices given earlier may be thus used (at least in a postcoordinate system), the non-generic relations are invariably associated with hierarchical linkage, whether this is via a classified index or a syndetic network of connective references. In a thesaurus, for example, generic hierarchical relations and synonym relations are often indi- cated separately, but individual non-generic relations are never recognized separ- ately. We conclude that there is nothing in the practice of indexing to suggest that a separate evaluation of each non-generic relation is necessary, but the collective contribution to index performance of these relations compared with the contribution of generic hierarchical linkage is a matter of some interest, and it seems reasonable to group them together as a comparable device. It may be noted that generic hier- archical linkage is itself an aggregate of several particular relations, just as this group is. The problem is discussed further in the section on Concept hierarchies in Chapter 5. (ix) Bibliographic coupling is a device for extending a class x (representing the subject of a particular document q_) by accepting all, or some of the documents which have cited q: or, by accepting all documents in a particular universe which have a certain number of citations (6 or 7, say) in common with q. (x) Associative indexing by machine [OCRerr]clumps', etc. ) The possibilities of automatic indexing now being explored by a number of investigators rest mainly on the assump- tion that classes useful for retrieval purposes can be established on the basis of the statistical characteristics of the index vocabulary (which may in fact approximate to the complete texts of the documents concerned). By using such features as the fre- quency of occurrence and co-occurrence of individual words and of particular word- clusters, their position in the text, their relative frequency compared with a stan- dard word-frequency list in the subject area concerned, and so on, associations between terms are established which then form the basis of search programmes. The criteria defining the classes to be examined are thus quite different from any of those listed above and therefore the procedure constitutes an indexing device in its own right. How far it might be feasible to distinguish particular procedures (e. g., the use of one statistical technique rather than another) is as yet uncertain. In particular, the purely statistical methods are in some cases replaced by methods using linguis- tic analysis, and insofar as these must overlap the 'semantic[OCRerr] devices already des- cribed (confounding of word forms, hierarchical linkage of various kinds) they may not merit the status of a discrete and unique index device. (xi) "L'Unit[OCRerr]" system described by te Nuyl (Ref. 22) is a somewhat exotic device whereby a reduced vocabulary is established in a quite mechanical way by lumping together all the terms in a given sequence of pages in the Concise Oxford Dictionary and treating their aggregate as a single class. As is the case with all drastically reduced vocabularies, it is argued that the theoretical absurdities which might arise (e. g., the appearance of documents on Acne in a search for Aconite, or on Conduc- tivity in a search for Cones) do not arise in fact, since subsequent coordination eliminates them. Any reduced vocabulary may be regarded as a recall device, in that it implies enlargement (by coalescence) of the classes which are formed initially by the indiv- idual index terms assigned to a document. Usually, reduced vocabularies are formed