CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Testing Techniques
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 98 -
These six index languages appeared to cover all reasonable permutations, since
it was not logical, for instance, to contemplate the use of quasi-synonyms without
the use of synonyms.
The searches were carried out by clerical labour, and the results were
recorded on a score sheet as shown in Fig. 6.7. The actual operation of carrying
out a search became known as 'putting the ruler down the sheets', since the use of
a straight edge to successfully uncover the postings for each document was found to
be the best method. The searches were made on the sets of search sheets (as in
Fig. 6.6). where each vertical column deals with one of the question starting terms,
and shows not only the occurrence of the starting term itself, but also the related
terms as described earlier. Often an examination of the postings for a certain
question needed some care in working out, since in one operation the search results
would be recorded for the six different index languages and for the three weights.
However, after a relatively short learning period, the clerical staff had no serious
difficulties. The time required to search a single question varied greatly[OCRerr] with this
particular set of six index languages, it might be anything from_ ten minutes to one
hour, being dependent on the number of starting terms, the frequencies of postings
for each starting term, and the number of terms related to the starting terms.
The score sheets list the document numbers on the left hand side, and across
the sheet space is given for recording the coordination level(i.e, the number of search
terms that match with the document terms) of each document for each of the six index
languages at each of the three levels of exhaustivity. The way this is done may be
seen by examining a search sheet (Fig. 6.6) for question 181 'Has any work been done
on determination of the nature of compressible viscous flow in a straight channel',
in relation particularly to documents 1963, 1966 and 1978.
The search sheet shows that document 1963 has two of the search terms present,
and alook at the codes shows that they are coded 1, the natural language terms, which
are included in all six languages. .Both terms have a weight of 8, and therefore do no4
come out at the lowest exhaustivity (weights 9 or 10), but do at the medium and hi[OCRerr][; levels.
The score sheet (Fig. 6.7) records this, the coordination score of 2 being put in e[OCRerr],ery
language at the medium and high levels of exhaustivity. Document 1966 has four of the
search terms present; two natural language, (1) one word ending(F) and one quasl-
synonym (K}. So taking the highest level of exhaustivity (5-10), every index language will
have a coordination score of at least 2; Index languages 3 and 4 will score 3, (1, 1 and F);
Index language 5 will also score 3, (1, 1 and K), but Index language 6 scores the maximum,
4, (1, 1, F and K) since it accepts both word ending variants and quasi-synonyms. Con-
sidering now the various levels of exhaustivity, index languages 1 to 4 have all their
terms weighted 9 or 10, and so keep the same coordination score at medium and low
exhaustivity, but index languages 5 and 6 have the quasi-synonym weighted 7, so at low
exhaustivity the coordination score drops to 2 and 3 respectively.
As a final example, for document 1978, one of the two search terms (Flow) is
shown to be present in natural language at a weight of 7, as a synonym (A-6) and also
as two quasi-synonyms (K-7 and M-9). All these, of course, only count as a coordinate
score of one since they are all separate alternatives to one of the search terms, but the
last quasi-synonym (M-9) is important because it is the only term at low exhaustivity.
The coordination scores for this document in table 6.3 are l for index languages 1 to 4,
and 2 for languages 5 and 6, with exhaustivity reducing these scores as shown.
Since the search rules at this stage allowed any combination of terms to be accepted,
it was never necessary to note which search terms occurred. Some combinations
accepted were obviously nonsense, e.g. document 1982 retrieved by the starting terms
Nature and Compressible is not meaningful, and is even worse when the quasi-synonyms