CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Testing Techniques chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 93 - The 361 questions which it was proposed to use for searching produced a total of 723 different terms, and these became known as 'starting terms'. As such they were terms used in the questions without being subjected to any controls, and were equivalent to the natural language index terms. For each starting term a set of sheets was provided, these sheets bearing the document numbers 1001-2400. As an example, consider the starting term 'Flow'. The pack of cards which had been posted with this term was taken, and the information transferred from the cards to the set of sheets. The code 1 was used to denote that it was the actual search term (i.e. Flow) that was being posted and Figure 6.2, which is an extract from the set of sheets dealing with'Flow', shows that a large number of documents were indexed by this term. In particular it can be seen that document 1933 was indexed by Flow at a weight of 9, as were documents 1939, 1940 and 1941. Document 1942 was also indexed by Flow, but on this occasion the weighting is 8. After all the indexing by Flow had been entered, additional entries were made for terms related to Flow. The authority sheet for this is shown in Fig. 6.3, from which it can be seen that Flux and Stream are considered as synonyms. The packs of cards posted for these terms would be taken, and entered on the sheets for Flow. Referring to Fig. 6.2, it will be seen that, for example, document 1978 is marked A6. This indicates that Flux, (wMch is coded A in Fig. 6.3) was indexed in this document at a weight of 6, while document 1-974 is one of several that was coded byStream(B) The variant word ending, Flowing, (coded E) was used in document 1968; of the quasi-synonyms shown in Fig. 6.3, Motion (_K) and Moving (M) are examples which both appear in document 1978. It will be noted that mu[OCRerr]iple posting can occur on one document number; 1978 has, in addition to Motion and Moving, also been posted with Flow and Flux. The reason for doing this will be explained later. The completion of this meant that there now existed a record of every time the starting term Flow or any of its synonyms, word endings and quasi-synonyms had been used as index terms. Since the codes for these were always kept constant (A-D for synonyms, E-J for wdrd endings and K-Z for quasi-synonyms), the staff always know to which group any particular entry belonged. The posting had been done on foolscap sheets and these were now cut into narrow strips, ¼ in. wide, each strip being serially numbered so as to maintain the document sequence order. These sets of strips were then filed in two specially constructed 'beehive' cabinets (Fig. 6.4). In effect, a separate index was now compiled for each question by the preparation of a set (,l' search sheets. The production of the[OCRerr]'e in relation to a particular question was controlled by the question starting term card, aa example for question 181 being shown in Fig. 6.5. This listed the starting terms for the question and the order of the terms onthe search sheets, this order being of importance in relation to some of the searching options. To prepare the search sheets, the sets of strips for each of the starting terms were obtained and assembled one page at a time by being clipped to a set of 23 prepared boards. These boards showed the document numbers at the extreme sides, and the strips were arranged in correct alignment with the numbers. When all 23 boards had been thus prepared, a xerox copy was made of each board; the result is shown in Fig. 6.6, which illustrates one of the 23 sheets for question 181 in relation to documents 1931-1992.