CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Citation indexing and bibliographic coupling
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 249 -
In the method used for compiling the scores in the above results, what
might be described as the "entry document" was, in the scoring, also counted
as a successfully retrieved document; in other words, a previously known
relevant document was scored as being a successfully retrieved relevant
document[OCRerr] To put it at its simplest. Q227 (as can be seen from Fig. 7. IT)
has two relevant documents, the numbers of which were 2087 and 2088. When
document 2087 was used as an "entry document", it was found that it had
[OCRerr]hree references in common with document 2088, and therefore both documents
were entered as being retrieved at a coupling strength of 3. To take another
example Q100 has four relevant documents, numbers 1785, 1786, 1787 and
1788. In the test search, documents 1787 and 1788 were found to have a
coupling strength of 6, and documents 1785 and 1786 had a coupling strength
of 3. However, there were no references that were common to the pair of
documents 1785 and 1786 on the one hand or the pair of documents 1787 and
1788 on the other hand. In spite of this, it would be scored as all four
relevant documents having been retrieved at a coupling strength of 3 and
lower. As a third example, for Ql!6 there were six relevant documents,
numbers 1317, 1574, 1575, 1576, 1578 and 1656. In the search, document
1576 had a coupling strength of 6 with documents 1574 and 1578, and a
coupling strength of 2 with documents 1575 and 1317. In addition document
1317 had a coupling strength of 2 with document 1656. Therefore, at this
coupling level this would be recorded as a successful retrieval of all six
relevant documents.
By the second method of presenting the results, allowance would be
made for these various situations. With Q227, the "entry document" would
be eliminated from the scoring; it would be considered that there was only
one relevant document, and that this was retrieved. With QlOO, however,
the first "entry document" would be eliminated from the scoring, but since
there was ne iink between the two pairs of documents, it would be consid-
ered that of the three remaining relevant documents, two had been retrieved.
With Qt16, the "entry document" would be eliminated from the scoring, but
since the other five documents were linked either directly or indirectly with
the "entry document", all these five documents would be included in the
scoring.
On the other hand, with those questions such as Q.122 or Q.132,
where no relevant documents were retrieved, the total of relevant documents
would in each case be reduced by one.
The result of this exercise is to produce a new set of performance
figures where there are now only 156 relevant documents, and {he results
are presented in Fig. 7.10T. In doing this, it is only the recall and
precision ratios that are changed, for the fallout ratio remains the same as
in Fig. 7.2T.
It was earlier suggested that it would be reasonable to compare the
results by this method with those obtained by the coordination level
cut-off. However, as the generality number has been changed, by eliminating
42 relevant documents, it is necessary for this to be done on a recall/fallout
graph as in Fig. 7.11P where comparison is made with the Single Term index
languages which gave the best and worst performance.
Further tests were carried out where account was taken of the
proportional match between documents, this being based on the number of
references in the documents concerned. The procedure for doing this was
described on pages 110 and 112 of Vol. I. It can make no difference to the