CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Conclusions
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 255 -
classes by the use of various recall devices results in a considerable
improvement in performance, which is contrary to the effect observed
with the Single Term index languages. This leads to the following conclusions.
There was in this test an optimum level of specificity in the terms
which were used. The conceptual terms of the Simple Concept index languages
were ove[OCRerr]-specific when used in natural language, this high level of
specificity being related to the strength of interfixing between the single terms
of the natural language. Because of this, the broadening of the natural
language concepts into more general classes resulted in a significant improve-
ment in performance, in that it helped to overcome the high specificity. On
the other hand. the Single Terms in natural language appear to have been near
to the correct level of specificity; only to the relatively small extent of
grouping true synonyms and word forms could any improvement in performance
be obtained. Contrary to the experience of Simple Concepts, the broadening
of the classes by the use of quasi-synonyms or hierarchical grouping resulted
in a significant loss of performance. In between these two extremes of
Single Term and Simple Concepts came the Controlled Terms. Less specific
than the Concepts but more specific than the Single Terms, the effect of
broadening the classes from the Controlled Terms Basic Terms (Index
Language III. l.a) was to depress the performance, although not to the same
extent as single terms.
While the evidence is not so easy to interpret from the tables and plots
of the main test results as given in Chapter 4, it is quite obvious that
within the various groups of index languages - where a direct comparison can
be made - there is a difference between systems, and that these substantiate
the rankings which are given in Chapter 5.
To restate the main conclusions more precisely
In the environment of this test, it was shown that the best performance
was obtained by the use of Single Term index languages.
With these Single Term index languages, the formation of groups of
terms or classes beyond the stage of true synonyms or word forms
resulted in a drop of performance.
3. The use of precision devices such as interfixing and partitioning was
not as effective as the basic precision device of coordination.
In the light of these unexpected conclusions, it is necessary to consider
very carefully the test environment and to see whether there is ,any factor
which could have distorted the results.
The subject field is a matter on which it is difficult to argue. There
has in the past been a tendency to assume that, with an imprecise (mushy)
subject language, where the same notion can be expressed in several
different ways, there is the necessity for broad grouping of terms in the
index language. Yet it seems possible that this imprecision is such that it
is virtually impossible to make any logical practical grouping or class which
can improve overall performance. To form a single class of two vague,
imprecise terms may merely add confusion to confusion, so that any resulting
improvement in the retrieval of relevant documents is more than outweighed
by the increase in the retrieval of non-relevant documents.
:!ii