CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Testing Techniques
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 105 -
prescriptions contained more related terms than the single term searches did, and
would have required more divisions than the 26 in a single letter code. Another
answer to the posting problem was not to post any related terms on a document when
the natural language term or synonym term (both included in every aggregate of
devices) was already there. This could be done provided that a related term did not
improve the weights. For example, in document 1978 in fig. 6.6, Flow appears as
such as i-7. Because of this, A6 and K7 are really redundant, but on the other hand
the posting of Moving (M) at a weight of 9 is required since this improves the performance
in regards to weighting. This superfluous posting was done deliberately on the single
terms to enable decoding of all search terms for the interfixing test, but no such
requirement existed in the concept searches, and such posting was left off.
As stated, the first series of tests had been done using the minimum of intellect
in the search programmes, with the result that many documents were retrieved on
nonsensical combinations of terms. At later stages in the test, increasing intelligence
was put into the search programmes; this is another way of saying that the requirements
were more stringent. This was done in various ways, and each time the attempt was
made to identify the particular intellectual decision which had been taken. One example
of this is given in Fig. 6. ll, where the search was being carried out on the Controlled
Term Vocabulary. There are four starting terms, Compressible flow, Viscous flow,
Channels and Straightness. Instead of any combination of these being accepted at the
various levels of coordination, the search instructions specifically state, for instance,
that Compressible flow and Viscous flow are not acceptable on their own. In fact, the
definite requirement is that Channels must always be present.
This chapter has only considered the general techniques which were used in
carrying out the tests. Quite inapplicable as far as can be seen to any operational
situation, they gave, albeit with a large amount of clerical effort, all the flexibility
that was required. One pGint which should be made clear concerns the prior knowledge
regarding which documents were relewmt to which question. This knowledge was not
available to the indexers at the time of indexing, so therefore there is no question of
the indexing being slanted towards a particular question. In theory it could have been
available to Mills at the time when he was preparing the [OCRerr]roups of related terms and
the various hierarchies. In fact, Mills was doing this work in London while the
indexes were being prepared and the searches were being carried out 50 miles away at
Cranfield. Even if he had had access to this data and had attempted to use it in
preparing these lists, we do not believe it would have made any significant difference
to the results. With regard to the searching, the description given in this chapter
of the methods used should make it obvious that its comprehensive nature precluded
any possibility of influencing the results. ,