CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Main test results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
80 -
(4) The relevance of the documents to the questions was assessed by
the questioners in four grades of relevance (see Vol. I, p.21). Most
of the test results are given for documents of all grades of relevance
{shown as 1-4), except in Section 4.6 which specifically deals
with the effect of relevance.
(5) The main test collection had 1,400 documents, but two other
document sets were used. These were of 200 and 350 documents,
a characteristic of these smaller sets being that all the documents
were concerned with aerodynamics, whereas the main set contained
some 300 documents on theory of aircraft structures.
(6) The largest set of questions in the test had 221 questions.
Most of the results are based on a subset of 42 questions, all of
which were concerned with aerodynamics. Another subset had 35
questions, the characteristic of which was that each question had
seven starting terms. Other sets were used in special cases;
full details are given h[OCRerr] Figure 2.12, and also in the appropriate
section of the tables.
(7) The number of relevant documents will vary with the document
set, the question set and the relevance grade. This number must be
known for calculating the recall ratio.
(8) The generality number is a function of the number of relevant
documents and the number of documents in the test eollection. With
Question Subset 2 (for which there are 198 relevant documents), when
the search is made on the 1400 document collection, the generality
number is 3.4. When searched on the 200 document collection, the
generality number is 23.[OCRerr] The effect of the increase in generality
number is to bring about an apparent improvement in the perfort:aance
figures. The matter of generality is fully discussed in Chapter 3.
(9) All the test results given in this chapter were based on searches
where the coordination level was progressively decreased from the
maximum down to a single term. The maximum number of single terms
in any question was 15, while the lowest number of terms was 2,
(10) In most situations, for the reason stated in the previous paragraph,
the number of questions which can be searched at a given coordination
level will be limited by the number of questions having that number
of starting terms. This information is given in column z, which shows,
for example, that at a coordination level of 6, there are 164 questions
which, having six or more sta[OCRerr]ing terms, can be searched at this
level.
In certain searches, the number of questions actually searched at
a given coordination tevel was less than the theoretical maximum possible.
This was because of the large clerical effort required and the number
of questions actually searched is given in column y.