CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
General Considerations
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 3 -
more relevant documents, one is forced to accept a proportionately larger number
of non-relevant documents. Alternatively, if it is desired to restrict the number
of non-relevant documents, this can only be done at the cost of also missing some of.
the relevant documents. Our experience, backed by the results of tests carried out
by a number of other investigators, leads us to believe that this is a fact. However,
until in a later volume the further evidence of some 120,000 searches has been pub-
lished. we will. to avoid argument, call it a hypothesis.
Instead of the form in which it is stated in (5) above, it would be more precise
if it were stated as follows: Within a single system, assuming that a sequence of
sub-searches for a particular question is made in the logical order of expected
decreasing precision and the requirements are those stated in the question, there
is an inverse relationship between recall and precision, if the results of a number
of different searches are averaged.
There are here four qualifications to the original statement. Concerning the
logical order of sub-searches, assume the request is for information on Siamese
cats. A reasonably logical order of sub-searches might be
F
A Siamese cats y\
B Domestic cats r_/ vc, ^
C Domestic pets f\.^
D Wild cats 0 /o
E Cats /
F Felidiae fl
G Lions
In such a case the inverse relationship would be expected to hold. However if one
first searched under 'Lions', it might reasonably be expected that the recall ratio
and the precision ratio would be very low, so that going next to 'Siamese cats' would
improve both recall and precision. This qualification is therefore only put in to cover
the somewhat absurd situation suggested, and can hardly be said to weaken the basic
assertion, any more than can the point that the requirements are those stated in the
question. This is to cover the situation when the questioner asks for information in
Pekenese dogs and, when presented with the output, says that he really required in-
formation on Siamese cats. In a very much more subtle way. this situation frequent-
ly occurs in operational systems; what is really happening is that a new question is
being put to the system.
In single cases there may be exceptions to the general rule, particularly In
the case where, although there is at least one. there are relatively few relevant
documents. In such a situation, the first sub-search may well fail to produce a
relevant document, so at this stage the recall can only be described as 0% recall
and 0% precision. The finding of a single relevant document in a later sub-search
will obviously improve both relevance and recall so, for complete accuracy, it is
necessary to add the qualification that the results of a number of searches should
be averaged.
The final qualification "within a single system" is more difficult to discuss at
present, for the question of what is a "single system" is fundamental to the project
considered in this volume, for it could be said that we have been endeavouring to find
how the changing of a component (e. g. any variable) in a sub-system (e. g. an index
language) of a complete I.R. system can improve both recall and precision. This
point also came to the fore in connection with the test results obtained by Professor
Salton with the SMART Programme (ref. 30) where a number of different "options" -