CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Conclusions
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
258 -
only be based on a match between words in the document and the qu'estion.
might be expected to favour systems using precise natural language, while
user relevance decisions might logically be expected to favour systems which
bring in groups of related terms. The conclusion is therefore reached
that the method of obtaining relevance decisions in this test could not have
been responsible for the unexpected results, since any influence it might have
had would have tended to work in the opposite direction.
Without going against the above argument, Vickery (Ref.l,9 rightly
points out that "There are still verbal links between source document and
question; the questions supplied by the author - some time after [OCRerr]!,,ing the
research - were formulated after the cited papers had been read and possibly
influenced the wording of his question. " This raises two separate questions;
firstly, is it very much different to what happens in a real life situation, and
secondly, is the effect serious enough to distort the test results? To consider
the first point, experience in the evaluation test of Medlars at the National
Library of Medicine has shown that the majority of questioners are already
aware of certain relevant documents before asking for a search to be carried
out. It therefore seems likely that, in real life, search questions must often
be influenced by the terminology of relevant documents, and therefore the
procedure which was adopted in this test for obtaining questions is not far
removed from what normally happens. If, however, the actions of those who
prepared the search questions were significantly different from what happens
in real life, then it is necessary to consider whether the results are likely
to have been distorted. To determine whether this is so would require a
far deeper analys[OCRerr]s of the individual searches than has so far been done.
Our own opinion is that if such an analysis were made, it would show that in
the large majority of cases there had been no serious distortion, and it is
difficult to believe that the few cases where'it might have occurred would
have been sufficient to produce the significant - and consistent - variations in
performance.
The concept indexing w0.s done by selecting from each document those
concepts which appeared to be of importance. This being an intellectual task
it is not possible to argue that it was done correctly. Readers of the reports
on Cranfield I will recollect that the errors of the indexers were the cause of
a significant number of failures to retrieve relevant documents, but that
considered as a percentage of torsi indexing, it represented a very low
"error rate". Usually in that test the errors were errors of omission. The
higher level of indexing exhaustivity, and the longer time devoted to indexing
each document made it less likely that these would occur in this project, and
some analysis of the failures to retrieve relevant documents has not revealed
any significant errors in this respect. Certainly it does not seem plausible
to suggest that any such errors could have influenced the comparative results.
While the complete indexing was more exhaustive than would normally
be the case, the assignment of an indexing weight to each concept permitted
the testing of various levels of indexing exhaustivity. The test results are
given in Chapter 4, Section 4, and again show that whatever the level of
indexing exhaustivity might be, the effect of moving from !ndex Language
I.i.a to Index Language 1.6.a is consistent, and there is no evidence to
suggest that the exhaustivity of indexing affected the comparison between
different index languages.
Concerning this level of exhaustivity of indexing, it again becomes
obvious that there was an optimum in regard to this particular document/
question set. The lowest level of exhaustivity of indexing investigated was
the search on titles only; the highest level of exhaustivity occurred with the