CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Testing Techniques chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 105 - prescriptions contained more related terms than the single term searches did, and would have required more divisions than the 26 in a single letter code. Another answer to the posting problem was not to post any related terms on a document when the natural language term or synonym term (both included in every aggregate of devices) was already there. This could be done provided that a related term did not improve the weights. For example, in document 1978 in fig. 6.6, Flow appears as such as i-7. Because of this, A6 and K7 are really redundant, but on the other hand the posting of Moving (M) at a weight of 9 is required since this improves the performance in regards to weighting. This superfluous posting was done deliberately on the single terms to enable decoding of all search terms for the interfixing test, but no such requirement existed in the concept searches, and such posting was left off. As stated, the first series of tests had been done using the minimum of intellect in the search programmes, with the result that many documents were retrieved on nonsensical combinations of terms. At later stages in the test, increasing intelligence was put into the search programmes; this is another way of saying that the requirements were more stringent. This was done in various ways, and each time the attempt was made to identify the particular intellectual decision which had been taken. One example of this is given in Fig. 6. ll, where the search was being carried out on the Controlled Term Vocabulary. There are four starting terms, Compressible flow, Viscous flow, Channels and Straightness. Instead of any combination of these being accepted at the various levels of coordination, the search instructions specifically state, for instance, that Compressible flow and Viscous flow are not acceptable on their own. In fact, the definite requirement is that Channels must always be present. This chapter has only considered the general techniques which were used in carrying out the tests. Quite inapplicable as far as can be seen to any operational situation, they gave, albeit with a large amount of clerical effort, all the flexibility that was required. One pGint which should be made clear concerns the prior knowledge regarding which documents were relewmt to which question. This knowledge was not available to the indexers at the time of indexing, so therefore there is no question of the indexing being slanted towards a particular question. In theory it could have been available to Mills at the time when he was preparing the [OCRerr]roups of related terms and the various hierarchies. In fact, Mills was doing this work in London while the indexes were being prepared and the searches were being carried out 50 miles away at Cranfield. Even if he had had access to this data and had attempted to use it in preparing these lists, we do not believe it would have made any significant difference to the results. With regard to the searching, the description given in this chapter of the methods used should make it obvious that its comprehensive nature precluded any possibility of influencing the results. ,