CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
General Considerations
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1. No significant improvement in indexing is likely beyond an indexing time of four
minutes, (which is taken to be equal to about seven minutes in a real-life situation).
2. Trained indexers are able to do consistently good indexing although they lack sub-
ject knowledge.
3. Indications are that information-retrieval systems are operating normally at a
recall ratio between 70% and 90% and in the range of 8% to 20% precision ratio.
4. There is an optimum level of exhaustivity of indexing. To index beyond this limit
will do little to improve recall ratio but will seriously weaken the precision ratio.
5. There is an inevitable inverse relationship between recall and precision.
6. Within the normal operating range of a system, a 1% improvement in .precision
will result in a 3% drop in recall.
7. The most significant result of the main test program was that all four indexing-
methods were operating at about the same level of recall performance.
In some published comments on Swanson's paper (Ref. 4A) it was suggested that
the following points should be considered in addition to those listed above.
8. The most important factors to be measured in the evaluation of information retri-
eval systems are recall and precision.
9. The physical form of the store has no effect on the efficiency of the system with
regard to recall and precision.
10. The index language has a relatively minor effect on the operational performance
of an information retrieval system.' The main influence is the intellectual stage of
concept-indexing.
11. Given the same concept-indexing, &ny two or more kinds of index languages will
be potentially capable of similar performance in regard to recall and precision.
12. The more complex an index language (i.e. , the more devices it incorporates),
the greater the range of performance in regard to recall and precision.
13. Maximum recall is dependent on exhaustivity of indexing; maximum precision is
dependent on the specificity of the index language.
Of the above, numbers 1, 2, 3 and 6 were presented with the qualification that
they only applied to the set of documents and set of questions that were investigated,
namely a collection in the general subject area of engineering, metallurgy and physics.
The remainder appeared to be of general application, and numbers 4, 5, 7, 8, 11, 12, and
13 in particular formed the basis of the present work. It ie not suggested that all these
hypotheses were new; it was merely that, with the results from Cranfield I, experi-
mental data were now available which appeared to justify them.
Possibly the point which has attracted most attention and criticism has been in
regard to the assertion that there is an inevitable inverse relationship between recall
and precision. This, in other words, implies that if an attempt is made to retrieve