CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 69 -
of every one of the 221 questions.
Method 5 differs from all other methods described so far in be[OCRerr],g
based on actual retrieval resu2/s obtained in testing. The method was
generally known as 'recall levels', because a series of recall ratios
is chosen in advance, and the performance results closest to the
chosen recall levels are used to obtain the totals, irrespective of the
coordination level of the search terms. Ideally this method should
be applied to each individual question in a set, with the recall and
precision ratios attained by each question being recorded when closest
to 5% recall, then 10% recall, and so on. The calculations by Method
5 approximated to this by usin_[OCRerr] the recall levels of the nine retrieving
term groups. The recall ratios of these retrieving term groups were
arranged by a set of twenty-one recall levels, being 0%, 5%, 10% etc.
to 100[OCRerr]o, and then the results in figures thus arranged were used to
obtain twenty-one sets of recall and precision ratios. Fig. 3.31TP
gives the table and plot of results, and the large number of performance
points on the plot show a slight scatter through which the performance
curve is drawn.
Method 6 was known as 'Document output cutoff method', and was
based on quite different principles to those already discussed. To
explain this method, it is first necessary to consider the effect of the
'conventional' search cutoff method used in the test. This, as has been
explained, was based on the coordination level, which is to say that with,
for instance, a six-term question, the search result would be recorded
for a coordination of all six terms, then it would be recorded for a
coordination of five terms, then for a coordination of four terms and so
on. It was this method of search cutoff, with questions having a range
of different potential coordination levels, that caused the problem in
totalling the results of the whole set of questions, and Method 6,
involving a document output cutoff, seemed to overcome this problem.
To apply this method, it was first necessary to obtain a ranked
order of documents for every question, and, in our case, this had to
be based on the coordination level cutoff results. A method of doing this
was developed, but it entailed a considerable amount of effort.
The decision as to which method to use for presentation of the
results was not easy to make and has probably involved more discussion,
both amongst ourselves and with other people, than any other single
aspect of the test. The necessity for the particular series of attempts
to total the results was due to the problem created by the coordination
level cutoff. It seems reasonable to assume that the final method
discussed, the document output cutoff method would be most satisfactory
since it eliminated the basic problem of totalling different sets of results
but it appeared to involve more effort than could be afforded.
Jlil i
i