CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Testing Techniques
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- I02 -
associated with a startlng-term that was different from the group of synonym, word
endings and quasi-synonym described earlier. There were some minor modifications
in preparing the indexes, but in general the basic procedure described above was used
for this further testing.
There was the additional necessity of investigating, on the single terms, the
precision devices of interfixing and partitioning, which, as described earlier, are
the two stages of links which were recognised, interfixing being concerned with
single terms within a concept, while partitioning deals with concepts within a theme.
This operation was done by examining the original indexing sheets for the relevant
and non-relevant documents that had been retrieved as a result of the searches
described above.
To illustrate the procedure adopted, Fig. 6.9 shows the processing of one of
the relevant documents (2076) to question 51. This question has eleven starting terms;
these are set out at the top of the table, with the double dividing lines indicating the
concepts into which the question terms are divided, namely Displacement-Thickness;
Plate-Flat; Flow-Compressible; Boundary-Layer-Laminar; Formula-Approximate.
These concepts are the pairs and triplets of terms which must be interfixed within
concepts. In testing partitioning, all the terms in the search are demanded to occur
in one theme of the indexing. Each asterisked term in Fig. 6.9 is the basic term in
its concept, and the search rules in operation at this stage of the test demanded that
no subsidiary term (i. e. non-asterisked term) would be accepted unless the basic
term was present. Thus in the index terms contained in document 2076 listed in the
second row, the last term Approximate is not accepted, since Formula is not present.
This row shows all the index terms in document 2076 that match with the terms re-
quested in the search prescription, with the weights in brackets, this information
resulting from decoding the entries on the search sheet. The index sheet of document
2076 (fig. 6.1) is examined next, the index terms in row 2 are located in the indexing,
and the code letters assigned to the concepts in the indexing are recorded in the third
row. The first two terms, Displacement and Thickness, both occur in Concept i, and
therefore are interfixed; the fourth and fifth terms, Flow and Compressible, occur
respectively in concepts d and e, so no interfixing is present. However, an alternative
quasi-synonym acceptable in place of Compressible is Hypersonic; this occurs in
concept d and thus interfixes with Flow. The fourth row shows the themes from the
indexing that contain the greatest number of search terms; theme 02 does not include
'Displacement Thickness'; while theme 54 has this concept, it does not include 'Plate
Flat', so both themes give the same results, since both eliminate one concept of two
terms. From this data the results can be calculated for interfixing, for partitioning
and for partitioning with interfixing, in all of the six index languages and at the three
levels of exhaustivity. The results for this single document in regard to these devices
are shown on the score sheet (fig. 6.10). This procedure was carried out on all the
relevant documents in the questions tested, and also several of the non-relevant
documents were examined. The totals of relevant and non-relevant documents for a
question are again recorded on a results sheet as before, and from this can be seen
the effect on recall and precision of these powerful precision devices.
The testing of the simple concepts involved more index languages than the single
terms, since 16 aggregates of recall devices were tested, in this case the code
letters used in the columns were each allotted to a single device, rather than a group
of letters to a device. (e.g. B was synonyms, C was species, so that even if there
were five synonyms or five species, they were all coded with B or C). This was done
not only because of the large number of separate results wanted, but because the search