CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 75 - 5437 The fallout ratio would now be 58602 = 9,278%. This fallout is [OCRerr]now identical with that of collection A in Fig. 3.33T; it should be noted however, that these figures would result in the precision ratio falling from 3.2% to 2,4%. One has various options as to how to correct the precision ratio according to generality; it is possible to convert A to B (i.e. 23.6 to 3.4}, B to A (i.e. 3.4 to 23.6) or to take a figure intermediate between A and B, such as 11. The effect of these three possible changes would result in the following figures:- Uncorrected Precision Adjusted Precision Ratio Fallout Ratio G = 3.4 G = 23.6 G = 11 Ratio Collection A 14.8% 2.4% 14.8% 7.3% 9.278% Collection B 3.2% 3.2% 19.0% 9.7% 6. 798% Whereas uncorrected precision ratio shows A to be superior, all adjusted precision ratios show B to be superior. To discover what is the factor which, in terms of the two collections, causes the difference in performance, Collection A will be taken as giving the expected result, and we will investigate the reasons why B should show the improved performance after precision ratio has been adjusted. The problem is why, with collection B, fewer non-relevant documents are retrieved than expected. This can be explained by saying that there is more diversification in the indexing terms (and, therefore, presumably of the subject) of some of the documents in the larger file in relation to the search terms of the questions. The 42 questions in the test were all specifically on aerodynamics, as were all the 200 documents in collection A. However, it is known that 257 of the documents in collection B were included in relation to questions dealing with the theory of aircraft structures; if it is assumed that these were never retrieved by any of the 42 questions on aerodynamics, then this would reduce N for collection B from 1400 to 1143, which is shown as B1 in Fig. 3.34T, where the new generality number and fallout ratio are given. The fallout, at 8. 333%, is now closer to, but still does not reach, the level for collection A. It is therefore clear that if the performances are to be equated, it is necessary to hypothesise that in collection B there is a further subset of documents which are not retrieved by the questions. This number can be found by calculating the size of a hypothetical collection, B2, which would result in an identical performance as collection A; the size of this if'