CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Citation indexing and bibliographic coupling
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 251 -
FIGURE 7.12T
Index Language Citation Indexing and Bibliographic Coupling (Weighted)
Documents Relevance 1-4
Number of Documents in Collection
Number of Questions 42 (Subset 2)
Number of Relevant Documents 198
Generality Number 3.4
1400
Weighted
Coupling
Strength
150+
81 -150
51 -80
31-50
21 -30
16-20
11 -15
6-10
3-5
1-2
Documents
Retrieved
Rel. Non-tel.
131 5495*
122 1758
111 1387
90 854
67 432
51 207
36 126
23 59
12 2O
2 1
Recall
Ratio
66.1%
61.6%
56.0%
45.5%
33.6%
25.7%
19.1%
11.6%
6.0%
1. o%
Precision
Ratio
2.3%*
6.5%
7.4%
9.5%
13.4%
19.7%
23.1%
26.0%
37.5%
66.7%
Fallout
Ratio
9.377%*
3.000%
2.367%
1.457%
0.737%
0.353%
0.221%
O.lO1%
0.034%
0.002%
final figure of relevant and non-relevant documents retrieved, but replaces
the groups formed at the various coupling levels as given in earlier totals
with new groups based on the weighted scores. The results on the 42
aerodynamic questions are shown in Fig. 7.12T; although different groups
are formed, there appears to be little variation from the performance for
the same doCument/question set presented in Fig. 7,2T.
As sta{ed in the opening chapter of this volume, we have considerable
reservations in presenting these results, in particular when it comes to
attempting to make comparison with the performance obtained by conventional
methods. One thing that can be stated positively is that the same inverse
relationship exists; bibliographic coupling is a precision device which has
very much the same effect as coordination in a conventional system.
Since approximately 12% of the documents did not contain any
references, it was inevitable that the maximum recall ratio should fall well
short of 100%. In the event it appears that, with this collection, something
around 70% recall might be expected; for any recall ratio lower than this,
the performance appears to compare quite favourably with conventional
indexing,