Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Test Environment chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -7- CONCEPT INDEXING I. Manual indexing, at three levels of exhaustivity 2. Natural language abstracts and titles INDEX LANGUAGES I. 2. 3. 4. 5. Single terms Simple concepts Controlled terms Abstracts and titles Recall devices a. Single term indexing, eight languages b. Simple concept indexing, fifteen languages c. Controlled term indexing, six languages d. Abstracts and titles, four languages. Precision devices a. Single term indexing, four types b. Simple concept indexing, one type c. Controlled term indexing, two types d. Abstracts and titles, one type SEARCH RULES 1. Coordination levels, all possible levels 2. Combination rules, six types. FIGURE 2.2 SOFTWARE FACTORS EXAMINED IN TEST Concept -Indexing The manual indexing carried out on the document collection is described in Chapter 4 of Volume 1, and this constituted the main body of data tested; of particular importance was the fact that three levels of exhaustivity of indexing were distinguished. The results of this variation in exhaustivity have been evaluated on the single term languages, but not on the simple concept or controlled term languages. In addition, Professor Salton prepared (with the SMART programme} a KWlC type index of the titles and abstracts of 200 documents (subset 1); in this connection abstracts and titles can be considered as variant forms of concept indexing, and the test searches which were made enabled direct comparison to be made with the manual indexing carried out by the project staff. Data concerning the usage of terms in the single term language is given in Fig. 5.1 of Volume 1; some additional information on term usage is given in Fig. 2.3 in relation to the simple concept and controlled term languages, the average postings per document being 18 and 24 respectively. Fig. 2.4 gives similar data for the abstracts, with the average postings of key terms being 74. This latter figure is not strictly comparable, since the same word may be 'posted' several times for the same document.