CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 34 -
RELEVANT NON-RELEVANT
RETRIEVED a b a+b
NOT RETRIEVED C d c+d
a+c b+d a+b+ c+d =N
(Total Collection)
FIGURE 3.2 2 x 2 CONT[OCRerr]GENCY TABLE
Whether it is correct to regard the values that result from retrieval
tests as components of a 2 x 2 table in the statistical sense, and thus
apply the principles and tests that have been developed for this situation
in statistics, is an unanswered question, and at this stage, therefore, the
use of this table is purely for convenience.
As mentioned earlier, there is the necessity of being able to make
a comparison between several sets of results obtained in different
conditions. This can only be done when it is known exactly which variables
are altered in the different situations; two such situations are considered.
Assuming N (the total collection) remains constant, a, b, c and d
can each vary, while a + b (total retrieved) and c + d {total not retrieved)
remain constant. More common is the situation where all the above six
values change, but a + c {total relevant) and b , d (total non-relevant) do
not alter. This is to say that the numbers of relevant and non-relevant
documents remain the same, but the numbers of retrieved and not retrieved,
together with the four categories making up these groups, all vary. In
such cases the change could be due to the 'cut-off' applied, that is the
point in the search where the rules do not allow any further documents to
be examined. At this stage the search is stopped and a record made of
all th[OCRerr] documents retrieved, both a (relevant) and b (non-relevant). A
different cut-off results in a different set of values for a and b, thereby
changing c and d, but without in any way affecting a + c or b + d.
Alternatively, the change could be due to different indexing decisions or to
different search strategies.
The second point to consider is the variables that affect a + c, b + d
and N. If the decision as to what is relevant (a + c) is altered, then it
must also result in a change for the total of non-relevant (b + d); if the
collection size (N).is changed, other values in the table may change.
Although significant changes of this nature occur rarely in operational
retrieval system tests, it is necessary to consider the matter in experi-
mental tests. Either type of change, i.e. altering the number of relevant
documents or altering the collection size, can vary the number of relevant
documents in relation to the collection size. Examples of the [OCRerr]wo types