CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Appendix 5A appendix Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 296 - APPENDIX 5A FORMULA FOR DOCUMENT RANKING BASED ON PROBABILITY CONSIDERATIONS by G.H. STEARMAN If a particular question at a particular coordination level results in the retrieval of a total of N docutnents, of which R documents are relevant, then ;h.e average time taken to find each of the relevant documents when a large number of searches is made can be determined on the basis of the following assumptions: - (a) Each successive document is selected at random. (b) The same time is taken to inspect each document for relevancy so that, for example, if a relevant document is found at the 3rd choice, three units of time are taken and if at the 7th choice, seven units and so on. If one unit of time is assigned to each choice, then the value of the average as defined above can be taken as the rank of the relevant document in a simulated ordering of the N documents. Let- - Total number of documents retrieved be N Total number of relevant documents be R Order of N be S (S = 1,2 -.o N) Order of R be K (K = 1,2 ... R) Then the problem is to find an expression for PK,S' the probability that the Kth relevant document will be found at the Sth inspection, where N and R are given. Then if QK is the simulated ranking, its value is given by the weighted sum QK = [OCRerr] S. PK, S S = 1,2 ...N ACB means the number of ways of choosing B items from A and is expressed as A: B' (A-B)' The probability PK, S may be determined as the ratio of the number of configurations in which the Kth relevant document appears at the Sth inspect- ion and the total number of ways in which R relevant documents may be arranged in N positions.