CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Test Environment
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-7-
CONCEPT INDEXING
I. Manual indexing, at three levels of exhaustivity
2. Natural language abstracts and titles
INDEX LANGUAGES
I.
2.
3.
4.
5.
Single terms
Simple concepts
Controlled terms
Abstracts and titles
Recall devices
a. Single term indexing, eight languages
b. Simple concept indexing, fifteen languages
c. Controlled term indexing, six languages
d. Abstracts and titles, four languages.
Precision devices
a. Single term indexing, four types
b. Simple concept indexing, one type
c. Controlled term indexing, two types
d. Abstracts and titles, one type
SEARCH RULES
1. Coordination levels, all possible levels
2. Combination rules, six types.
FIGURE 2.2 SOFTWARE FACTORS EXAMINED IN TEST
Concept -Indexing
The manual indexing carried out on the document collection is
described in Chapter 4 of Volume 1, and this constituted the main body
of data tested; of particular importance was the fact that three levels
of exhaustivity of indexing were distinguished. The results of this variation
in exhaustivity have been evaluated on the single term languages, but not
on the simple concept or controlled term languages. In addition, Professor
Salton prepared (with the SMART programme} a KWlC type index of the
titles and abstracts of 200 documents (subset 1); in this connection
abstracts and titles can be considered as variant forms of concept indexing,
and the test searches which were made enabled direct comparison to be
made with the manual indexing carried out by the project staff.
Data concerning the usage of terms in the single term language is
given in Fig. 5.1 of Volume 1; some additional information on term
usage is given in Fig. 2.3 in relation to the simple concept and controlled
term languages, the average postings per document being 18 and 24
respectively. Fig. 2.4 gives similar data for the abstracts, with the
average postings of key terms being 74. This latter figure is not strictly
comparable, since the same word may be 'posted' several times for the
same document.