CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Test Environment
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
15 -
restrictions concerning the actual combination of terms that would be
accepted at every coordination level, so as to eliminate the non-sensical
combinations.
The most satisfactory aud carefully applied search rules were
applied to the controlled language tests, since it was thought that
intelligence in searching would be best tested on an index language that
also had an average degree of intelligence used in its formulation. This
was Search E, where all the combinations of acceptable terms were individually
selected for each coordination level. 11 was usual to accept a number of
such combinations, wiT-h the object of retaining as many of the relevant
documents as possible. This search rule was applied to the comroiled
term index languages (Ill. 1 - III.6) both with and witheut the precision
device of weighting. "Zhe sets of acceptable combinations were formulated
on the basis of the starting terms of the question, and thus the use of
Search E in testing languages other than III.l (Basic terms) may have
resulted in a poorer" performance for the languages than is theoretically
possible; the reason for this is that the grouping of a number of terms
in the later languages might result in non-sensical combinations of terms.
One further additional rule designed to be used with the various
recall languages was tried. This was Search Type F, also carried out
on the controlled term index languages Ill. 2 to III. 6. The reasoning
behind this search was that in all previous rules tested, the terms that
actually made a match between a document and search prescription were
all treated ,equally'. For example, if two documents had a match of
five terms with a question using the controlled term index language Ill. 5a
(related terms), no distinction would be made between a document which
actually had four starting terms, ¢[OCRerr],,[OCRerr]k o[OCRerr]'[OCRerr],I o,[OCRerr]Q-[OCRerr] related term, and a second
document which was matched only by related terms, without a single
starting term. The first document clearly represents a closer match
with the search prescription, and it might generally be assumed that
a starting term match is more desirable than any related term match.
In Search F, a record was made of the number of starting terms that
came up in a given match, and was done with the rules of Search E in
use. This was used to make up sets of results with a given minimum
match demanded, and results will be given for controlled term languages
III. 5 and III.6.
Document relevance
Before demonstrating the form of the results obtained when these
variables are tested, a single environmental variable will be mentioned.
This is the variation made in document relevance, resulting from the scale
of four grades of relevance that was followed by the questioners in
assessing the relevant documents (see Vol I, p.21). In finding the
effect on retrieval performance of these decisions, four sets of results