CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Conclusions
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 257 -
page 14 of Volume 1) by anybody with reasonable knowledge of tile subject
field.
On the other hand, if the evaluation is only intended to cover a
sub-system of the complete operational system, such as the index language,
then there is not the same necessity of having "user relevance" decisions;
in fact, such decisions could introduce an additional variable which might
mitigate against the interpretation of the test results, and a set of "stated
relevance" decisions could be more satisfactory.
So far the argument has been concerned with the evaluation of operat-
ional systems. All the tests of experimental systems have been or are being
conducted in artificial, created environments. Under such circumstances,
"user relevance" decisions cannot be obtained, and in the few tests so far
carried out, "stated relevance" decisions of one kind or another have been
used. However, in this particular project, as explained in the first Volume
(pages 21 - 23) an endeavour was made to simulate "user relevance" decisions.
At the same time (and contrary to what was done in Cranfield I}, we delib-
erately eschewed any effort to interpret the stated needs; in all cases the
search terms were based solely on the terminology of the question. Whether
the original decision to simulate user relevance decisions was correct has
already been considered (Vol. 1, page 114) and tentatively the conclusion was
there reached that it might have assisted the interpretation of the test results
if, instead, stated relevance decisions had been used. On the whole, this is
a view to which we would still subscribe but for one fact. If stated relevance
decisions had been used, and assuming the test results had shown the similar
superiority of Single Term Natural Language, then it would have been virtually
impossible to refute an argument that the results were unduly influenced by
the relevance decisions.
In the artificial situation, a person - orza group of persons - is
presented with a search question (which may have been devised by someone
else) and a set of documents (or their surTogates in the form of titles or
abstracts) and told to make a series of decisions as to which documents are
relevant. He can be given specific instructions, such as the type of person
that he is supposed to be or the purpose for which he is supposed to require
the information. Whatever such instructions he may receive, he is ultimately
faced with a sequence of words which make up the question, and other sequ-
ence of words which make up the documents, and by the intensity with which
the words and the meaning of the question appear to match the words and the
meaning of a document, he must decide that a given document is or is not
relevant to a given question. In this artificial situation it seems reasonable
to assume - and such experimental evidence as is available bears out the
assumption - that there will be a closer direct match between the actual
words of a question and a relevant document, than is the case in the natural
situation of a questioner making user relevance decisions. Conversely, and
just as important, there will, in the artificial situation, be a lower match
between the question and a non-relevant document than will often be the case
with user relevance judgements.
Under such circumstances, it is highly probable that system perform-
ance will be better with stated relevance decisions, than with user relevance
decisions, since a source of possible error in the complete system has been
eliminated. This is not an important factor in the present investigation,
since the objective is not to obtain maximum performance per se, but is
concerned with the comparison between the performance of different index
languages. The important point is that stated relevance decisions which can