Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Conclusions chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 255 - classes by the use of various recall devices results in a considerable improvement in performance, which is contrary to the effect observed with the Single Term index languages. This leads to the following conclusions. There was in this test an optimum level of specificity in the terms which were used. The conceptual terms of the Simple Concept index languages were ove[OCRerr]-specific when used in natural language, this high level of specificity being related to the strength of interfixing between the single terms of the natural language. Because of this, the broadening of the natural language concepts into more general classes resulted in a significant improve- ment in performance, in that it helped to overcome the high specificity. On the other hand. the Single Terms in natural language appear to have been near to the correct level of specificity; only to the relatively small extent of grouping true synonyms and word forms could any improvement in performance be obtained. Contrary to the experience of Simple Concepts, the broadening of the classes by the use of quasi-synonyms or hierarchical grouping resulted in a significant loss of performance. In between these two extremes of Single Term and Simple Concepts came the Controlled Terms. Less specific than the Concepts but more specific than the Single Terms, the effect of broadening the classes from the Controlled Terms Basic Terms (Index Language III. l.a) was to depress the performance, although not to the same extent as single terms. While the evidence is not so easy to interpret from the tables and plots of the main test results as given in Chapter 4, it is quite obvious that within the various groups of index languages - where a direct comparison can be made - there is a difference between systems, and that these substantiate the rankings which are given in Chapter 5. To restate the main conclusions more precisely In the environment of this test, it was shown that the best performance was obtained by the use of Single Term index languages. With these Single Term index languages, the formation of groups of terms or classes beyond the stage of true synonyms or word forms resulted in a drop of performance. 3. The use of precision devices such as interfixing and partitioning was not as effective as the basic precision device of coordination. In the light of these unexpected conclusions, it is necessary to consider very carefully the test environment and to see whether there is ,any factor which could have distorted the results. The subject field is a matter on which it is difficult to argue. There has in the past been a tendency to assume that, with an imprecise (mushy) subject language, where the same notion can be expressed in several different ways, there is the necessity for broad grouping of terms in the index language. Yet it seems possible that this imprecision is such that it is virtually impossible to make any logical practical grouping or class which can improve overall performance. To form a single class of two vague, imprecise terms may merely add confusion to confusion, so that any resulting improvement in the retrieval of relevant documents is more than outweighed by the increase in the retrieval of non-relevant documents. :!ii