CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
summary
summary
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
SUMMARY
The test results are presented for a number of different index languages
using various devices which affect recall or precision. Within the environ-
ment of this test, it is shown that the best performance was obtained with
the[OCRerr]group of eight index languages which used single terms. The group of
fifteen index languages which were based on concepts gave the worst perform-
ance, while a group of six index languages based on the Thesaurus of
Engineering Terms of the Engineers Joint Council were intermediary. Of
the single term index languages, the only method of improving performance
was to group synonyms and word forms, and any broader groupings of terms
depressed performance. The use of precision devices such as links gave no
advantage as compared to the basic device of simple coordination.
All results have to be considered within the context of the experimental
environment, but they can be said to substantiate or clarify many of the
findings of Cranfield I. It is conclusively shown that an inverse relationship
exists between recall and precision, whatever the variable may be that is
being changed. The ,two factors which appear most likely to affect perform-
ance are the level of exhaustivity of indexing and the level of specificity of
the terms in the index language. For any given operational situation, the
optimum levels cannot be categorically stated in advance, but can only be
determined by an evaluation of the system, the main consideration probably
being the subject field.
It would be unusual ff the characteristics of the subject field used for
this test were such as to make it unique, so the high performance obtained
with the single terms in natural language can be considered to be of some
ir[OCRerr]portance in regard to the use of natural language text as input to
mechanised systems.