CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Test Environment chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 15 - restrictions concerning the actual combination of terms that would be accepted at every coordination level, so as to eliminate the non-sensical combinations. The most satisfactory aud carefully applied search rules were applied to the controlled language tests, since it was thought that intelligence in searching would be best tested on an index language that also had an average degree of intelligence used in its formulation. This was Search E, where all the combinations of acceptable terms were individually selected for each coordination level. 11 was usual to accept a number of such combinations, wiT-h the object of retaining as many of the relevant documents as possible. This search rule was applied to the comroiled term index languages (Ill. 1 - III.6) both with and witheut the precision device of weighting. "Zhe sets of acceptable combinations were formulated on the basis of the starting terms of the question, and thus the use of Search E in testing languages other than III.l (Basic terms) may have resulted in a poorer" performance for the languages than is theoretically possible; the reason for this is that the grouping of a number of terms in the later languages might result in non-sensical combinations of terms. One further additional rule designed to be used with the various recall languages was tried. This was Search Type F, also carried out on the controlled term index languages Ill. 2 to III. 6. The reasoning behind this search was that in all previous rules tested, the terms that actually made a match between a document and search prescription were all treated ,equally'. For example, if two documents had a match of five terms with a question using the controlled term index language Ill. 5a (related terms), no distinction would be made between a document which actually had four starting terms, ¢[OCRerr],,[OCRerr]k o[OCRerr]'[OCRerr],I o,[OCRerr]Q-[OCRerr] related term, and a second document which was matched only by related terms, without a single starting term. The first document clearly represents a closer match with the search prescription, and it might generally be assumed that a starting term match is more desirable than any related term match. In Search F, a record was made of the number of starting terms that came up in a given match, and was done with the rules of Search E in use. This was used to make up sets of results with a given minimum match demanded, and results will be given for controlled term languages III. 5 and III.6. Document relevance Before demonstrating the form of the results obtained when these variables are tested, a single environmental variable will be mentioned. This is the variation made in document relevance, resulting from the scale of four grades of relevance that was followed by the questioners in assessing the relevant documents (see Vol I, p.21). In finding the effect on retrieval performance of these decisions, four sets of results