CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Testing Techniques
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 93 -
The 361 questions which it was proposed to use for searching produced a
total of 723 different terms, and these became known as 'starting terms'. As
such they were terms used in the questions without being subjected to any controls,
and were equivalent to the natural language index terms. For each starting term a
set of sheets was provided, these sheets bearing the document numbers 1001-2400.
As an example, consider the starting term 'Flow'. The pack of cards which had
been posted with this term was taken, and the information transferred from the
cards to the set of sheets. The code 1 was used to denote that it was the actual
search term (i.e. Flow) that was being posted and Figure 6.2, which is an extract
from the set of sheets dealing with'Flow', shows that a large number of documents
were indexed by this term. In particular it can be seen that document 1933 was
indexed by Flow at a weight of 9, as were documents 1939, 1940 and 1941.
Document 1942 was also indexed by Flow, but on this occasion the weighting is 8.
After all the indexing by Flow had been entered, additional entries were made for
terms related to Flow. The authority sheet for this is shown in Fig. 6.3, from
which it can be seen that Flux and Stream are considered as synonyms. The
packs of cards posted for these terms would be taken, and entered on the sheets
for Flow. Referring to Fig. 6.2, it will be seen that, for example, document
1978 is marked A6. This indicates that Flux, (wMch is coded A in Fig. 6.3)
was indexed in this document at a weight of 6, while document 1-974 is one of
several that was coded byStream(B) The variant word ending, Flowing, (coded E)
was used in document 1968; of the quasi-synonyms shown in Fig. 6.3, Motion
(_K) and Moving (M) are examples which both appear in document 1978. It will
be noted that mu[OCRerr]iple posting can occur on one document number; 1978 has, in
addition to Motion and Moving, also been posted with Flow and Flux. The reason
for doing this will be explained later.
The completion of this meant that there now existed a record of every time
the starting term Flow or any of its synonyms, word endings and quasi-synonyms
had been used as index terms. Since the codes for these were always kept constant
(A-D for synonyms, E-J for wdrd endings and K-Z for quasi-synonyms), the staff
always know to which group any particular entry belonged.
The posting had been done on foolscap sheets and these were now cut into
narrow strips, ¼ in. wide, each strip being serially numbered so as to maintain
the document sequence order. These sets of strips were then filed in two specially
constructed 'beehive' cabinets (Fig. 6.4).
In effect, a separate index was now compiled for each question by the
preparation of a set (,l' search sheets. The production of the[OCRerr]'e in relation to a
particular question was controlled by the question starting term card, aa example
for question 181 being shown in Fig. 6.5. This listed the starting terms for the
question and the order of the terms onthe search sheets, this order being of
importance in relation to some of the searching options. To prepare the search
sheets, the sets of strips for each of the starting terms were obtained and
assembled one page at a time by being clipped to a set of 23 prepared boards.
These boards showed the document numbers at the extreme sides, and the strips
were arranged in correct alignment with the numbers. When all 23 boards had
been thus prepared, a xerox copy was made of each board; the result is shown in
Fig. 6.6, which illustrates one of the 23 sheets for question 181 in relation to
documents 1931-1992.