CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 75 -
5437
The fallout ratio would now be 58602 = 9,278%.
This fallout is [OCRerr]now identical with that of collection A in Fig. 3.33T; it
should be noted however, that these figures would result in the precision
ratio falling from 3.2% to 2,4%.
One has various options as to how to correct the precision ratio
according to generality; it is possible to convert A to B (i.e. 23.6
to 3.4}, B to A (i.e. 3.4 to 23.6) or to take a figure intermediate
between A and B, such as 11. The effect of these three possible
changes would result in the following figures:-
Uncorrected
Precision Adjusted Precision Ratio
Fallout
Ratio G = 3.4 G = 23.6 G = 11 Ratio
Collection A 14.8% 2.4% 14.8% 7.3% 9.278%
Collection B 3.2% 3.2% 19.0% 9.7% 6. 798%
Whereas uncorrected precision ratio shows A to be superior,
all adjusted precision ratios show B to be superior. To discover what is
the factor which, in terms of the two collections, causes the difference
in performance, Collection A will be taken as giving the expected result,
and we will investigate the reasons why B should show the improved
performance after precision ratio has been adjusted.
The problem is why, with collection B, fewer non-relevant documents
are retrieved than expected. This can be explained by saying that there is
more diversification in the indexing terms (and, therefore, presumably of
the subject) of some of the documents in the larger file in relation to the
search terms of the questions. The 42 questions in the test were all
specifically on aerodynamics, as were all the 200 documents in collection
A. However, it is known that 257 of the documents in collection B were
included in relation to questions dealing with the theory of aircraft
structures; if it is assumed that these were never retrieved by any of the
42 questions on aerodynamics, then this would reduce N for collection B
from 1400 to 1143, which is shown as B1 in Fig. 3.34T, where the new
generality number and fallout ratio are given. The fallout, at 8. 333%,
is now closer to, but still does not reach, the level for collection A.
It is therefore clear that if the performances are to be equated, it
is necessary to hypothesise that in collection B there is a further subset
of documents which are not retrieved by the questions. This number can
be found by calculating the size of a hypothetical collection, B2, which
would result in an identical performance as collection A; the size of this
if'