Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Citation indexing and bibliographic coupling chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 249 - In the method used for compiling the scores in the above results, what might be described as the "entry document" was, in the scoring, also counted as a successfully retrieved document; in other words, a previously known relevant document was scored as being a successfully retrieved relevant document[OCRerr] To put it at its simplest. Q227 (as can be seen from Fig. 7. IT) has two relevant documents, the numbers of which were 2087 and 2088. When document 2087 was used as an "entry document", it was found that it had [OCRerr]hree references in common with document 2088, and therefore both documents were entered as being retrieved at a coupling strength of 3. To take another example Q100 has four relevant documents, numbers 1785, 1786, 1787 and 1788. In the test search, documents 1787 and 1788 were found to have a coupling strength of 6, and documents 1785 and 1786 had a coupling strength of 3. However, there were no references that were common to the pair of documents 1785 and 1786 on the one hand or the pair of documents 1787 and 1788 on the other hand. In spite of this, it would be scored as all four relevant documents having been retrieved at a coupling strength of 3 and lower. As a third example, for Ql!6 there were six relevant documents, numbers 1317, 1574, 1575, 1576, 1578 and 1656. In the search, document 1576 had a coupling strength of 6 with documents 1574 and 1578, and a coupling strength of 2 with documents 1575 and 1317. In addition document 1317 had a coupling strength of 2 with document 1656. Therefore, at this coupling level this would be recorded as a successful retrieval of all six relevant documents. By the second method of presenting the results, allowance would be made for these various situations. With Q227, the "entry document" would be eliminated from the scoring; it would be considered that there was only one relevant document, and that this was retrieved. With QlOO, however, the first "entry document" would be eliminated from the scoring, but since there was ne iink between the two pairs of documents, it would be consid- ered that of the three remaining relevant documents, two had been retrieved. With Qt16, the "entry document" would be eliminated from the scoring, but since the other five documents were linked either directly or indirectly with the "entry document", all these five documents would be included in the scoring. On the other hand, with those questions such as Q.122 or Q.132, where no relevant documents were retrieved, the total of relevant documents would in each case be reduced by one. The result of this exercise is to produce a new set of performance figures where there are now only 156 relevant documents, and {he results are presented in Fig. 7.10T. In doing this, it is only the recall and precision ratios that are changed, for the fallout ratio remains the same as in Fig. 7.2T. It was earlier suggested that it would be reasonable to compare the results by this method with those obtained by the coordination level cut-off. However, as the generality number has been changed, by eliminating 42 relevant documents, it is necessary for this to be done on a recall/fallout graph as in Fig. 7.11P where comparison is made with the Single Term index languages which gave the best and worst performance. Further tests were carried out where account was taken of the proportional match between documents, this being based on the number of references in the documents concerned. The procedure for doing this was described on pages 110 and 112 of Vol. I. It can make no difference to the