CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Appendix 5A
appendix
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
296 -
APPENDIX 5A
FORMULA FOR DOCUMENT RANKING BASED ON
PROBABILITY CONSIDERATIONS
by
G.H. STEARMAN
If a particular question at a particular coordination level results in the
retrieval of a total of N docutnents, of which R documents are relevant, then
;h.e average time taken to find each of the relevant documents when a large
number of searches is made can be determined on the basis of the following
assumptions: -
(a) Each successive document is selected at random.
(b) The same time is taken to inspect each document for relevancy
so that, for example, if a relevant document is found at the
3rd choice, three units of time are taken and if at the 7th
choice, seven units and so on.
If one unit of time is assigned to each choice, then the value of the
average as defined above can be taken as the rank of the relevant document
in a simulated ordering of the N documents.
Let- -
Total number of documents retrieved be N
Total number of relevant documents be R
Order of N be S (S = 1,2 -.o N)
Order of R be K (K = 1,2 ... R)
Then the problem is to find an expression for PK,S' the probability
that the Kth relevant document will be found at the Sth inspection, where
N and R are given. Then if QK is the simulated ranking, its value is given
by the weighted sum
QK = [OCRerr] S. PK, S
S = 1,2 ...N
ACB means the number of ways of choosing B items from A and is expressed
as A:
B' (A-B)'
The probability PK, S may be determined as the ratio of the number of
configurations in which the Kth relevant document appears at the Sth inspect-
ion and the total number of ways in which R relevant documents may be
arranged in N positions.