CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Testing Techniques chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -90 - CHAPTER 6 Testing Techniques The choice of the physical method to be used for searching was important, but difficult to make. Since the work was entirely concerned with index languages, it was essential that the physical form of the index should in no way impede the investigation by introducing any controls or restrictions of its own. Although it was not possible to forecast exactly the many different tests that would be made, it was clear that for each question there would be the necessity of obtaining several hundred sets of performance figures. It was decided that a small test should be made soon after the project had commenced; this was to be done partly to check the indexing procedures but also to validate the proposed design of the tests and to provide experience that would assist in deciding on the physical form of the index. For this pilot test, 116 docun[OCRerr] :.,is had been indexed, and fourteen questions were available for searching, for which there were 26 known relevant documents. It was planned to investigate five sets of recall devices and four sets of precision devices, based on the single- term, natural language indexing. These variables alone appeared likely to result in some 80 searches for every question, and when other variables were added in the main test, the potential number of different searches could run into several hundreds. It was unlikely that every combination of the various devices would be required, but the method used had to be flexible enough to provide for all possible variations of searches, since it would only be after some searches had been made that it would be known which were unnecessary. Co-ordination was certainly the basic precision device, and some form of post-coordinate index was clearly required. For the pilot test, the decision was taken to prepare a peek-a-boo type index. This was done in a conventional way, but a complication arose due to the fact that, at this stage of the work, six different indexing weights were being used, and, to investigate the effect of these, it was necessary to have, for every term, six cards each of which represented a different weighting. The first search for a given question was carried out on the natural ianguage terms . Subsequent searches were made bringing in the various recall devices and precision devices; the nature of these searches is considered in more detail later in this chapter. The results of this test were interesting in themselves, but the main objective had been to obtain information on the techniques being used. In this respect, the test showed that the general test theory was reasonable and that the indexing was satisfactory for the objectives of the test. Quite definitely, however, it showed that a peek-a-boo index would be quite unsatisfactory for the main test. This was because much of the testing involved use of increasingly large numbers of terms in the search as the recall devices were tested, with the continual need for co-ordination of all the different combinations. For example, if a question had five terms searched on initially, and each of the five terms had one synonym, two word forms and four quasi-synon[OCRerr]ns, then in coordination of all five terms using all the recall devices, 32,768 different combinations are possible. After this, it would be necessary to search for any four of the five sets of terms, then any three and so on. It is true that by use of the lowest posted terms first, the number of coordinations to be done can be reduced considerably, but the use of natural language