CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Test Environment
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 21 -
(coordination level 4). and the ratios at the top left corner examined
(28% recall, 2% precision), the following variables are shown to have
produced that result: a search at coordination level of four terms;
search rule A (any combination); precision device 'a' (no linking in the
index language); relevant documents graded 1 only accepted; recall
language 1 (natural language terms); and indexing exhaustivity 1
(low exhaustivity). After this, a move across this section of the table
to the right will first alter the document relevance grades, then
introduce a search rule, then include the three precision devices and
finally test three more search rules. A move into the next section will
increase the coordination level of the search, and in any section a
move down the table will increase the indexing exhaustivity before
a new recall language is brought in.
The position of these variables in the table is of no significance;
the table could, for instance, first have been divided into the five recall
languages, with the seven coordination levels repeated at each stage,
etc. and hundreds of variations are possible. The actual combinations
of different variables for which results have been presented in the complete
composite table total 609, which is a choice of the most useful combin-
ations out of the theoretical total of 6720 combinations possible.
Each set of recall and precision ratios is an average of results
from the set of 35 questions and it is estimated that the composite
table represents more than 16,000 individual results. When it is
considered that the scope of the whole project extends to 221 questions,
that there are some 28 other index languages which are not included in
this table and that there are a number of other new variables, the
individual results available are estimated to exceed 200,000.
Environmental Factors
The main environmental factors involved in the testing are listed
in Fig. 2.11. For various reasons, as the test proceeded, different
sets of questions and collections of different sizes were used. To
consider first the sets of questions. Although 279 questions were available
for use, the largest set for which results are presented numbers 221.
The balance of 58 were multi-themed questions, that is they really
consisted of more than one question, e.g. Question 3 'How can one
describe the aerodynamic forces and the heating rates acting on high
speed aircraft'. Four of these were used in some of the smaller
question set s only. The first series of tests, on the recall devices
of the single-term index languages, were made of the complete collection
of 221 single-theme questions. The major problem that then arose was
to find a satisfactory method of totalling the results of searches based
on different numbers of starting terms (this matter is considered at length
in Chapter 3). For this reason, we investigated the results on a set of
35 questions each of which had seven starting terms. The tests on
interfixing and partitioning were particularly difficult to do, because of
the painstaking clerical work necessary. These were therefore done on