CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
General Considerations
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
"You had no right to be so intelligent with the Uniterm system; it is meant to be
used by people of low intellect. "
"The UDC had an unfair advantage because of the detailed alphabetical index
which you compiled."
"If you had not used the colon device (of the UDC) so much, it would not have per-
formed so well. "
"Subject headings are not meant to be so specific as those you used, and that is
why the alphabetical subject index performed so much better than it would normally
have done."
Although such comments seemed amusing, they were understandable in that in 1961,
the results coming from Cranfield I were contrary to firmly held beliefs, and the impli-
cations of the test results had not been appreciated. However, in a recent paper
(Ref. 8) Richmond writes ".. . systems designed with a universal approach to the
intellectual organization of information and those designed for limited use in parts of the
whole. The former, when one comes to a specialized field like aeronautics, is a dilute
approach, while the latter is a concentrated one. At Cranfield, the dilute approach was
made through the UDC and through alphabetical subject headings, which are generalized
concept terms. The concentrated one was made through a faceted classification, tailor-
made for the subject and through uniterms, which had a vocabulary of words taken directly
from documents dealing with the subject".
Here is shown the same categorical assertions as are contained in the earlier
quotations, that the UDC and alphabetical subject headings are only for universal appli-
cation,that they must not be used in a specialized subject field, and that if so used,
they cannot possibly be as efficient as the "concentrated systems". The fact that all
the experimental evidence is to the contrary appears to mean nothing, nor does the fact
that probably 90% of the operational UDC systems are concerned only with a "concentrated"
subject area. The UDC schedules used in Cranfield I were no exception, having been
developed over a long period by workers in the United Kingdom concerned with highly
specialised collections in the fields of aerodynamics and aeronautical engineering.
Again in the above quotation, there is the same confused thinking when it is
said of the Uniterm system that it has a "vocabulary of words taken directly from
documents dealing with the subject, " the implication being that the words found in
the other systems had come from some source outside of the documents. This is,
of course, untrue, for the facet classification, as is reported in ref. 1, was prepared
by taking the terms used in the literature and arranging them in categories and facets.
Equally so, there is no single term in the alphabetical index to the UDC or in the
alphabetical subject headings which is not found in the list of uniterms, or in its lead-in
vocabulary.
Unconsciously (because the significance of what was being done was not then
realised) we were providing an additional basis for a similar performance in regard
to recall by providing all four systems investigated with an equally effective lead-
in vocabulary, which is the first basic requirement for all index languages. By
'lead-in vocabulary' is implied a complete list of all the sought terms including
all necessary synonyms, that are used in the set of documents being indexed or in
the set of questions that is put to the system. While some - in fact, probably most -
operational index languages are deficient in this respect, this is an incidental as