CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Conclusions
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 259 -
search on abstracts. Intermediary were the three levels of indexing done
by the project staff. Figure 8.2T shows the normalised recall ratios
obtained in these five cases, all using natural language terms.
Index Language Average No. Normalised Recall
of Terms Ratio
Titles 7 59.76%
Level 1 Single Term Natural
Language
" 2 Single Term Natural 14 62.88%
Language
" 3 Single Term Natural 22 63.57%
Language
Abstracts 33 65.00%
Approx 60 60.94%
FIGURE 8.2T NORMALISED RECALL RATIOS FOR FIVE
LEVELS OF EXHAUSTIVITY
There is the possibility that the selection of terms by the indexer
was more descriptive of the document content than those terms Used for
the titles and the abstracts, but the main variable in these five results
concerns the level of indexing exhaustivity. It would seem that while
the titles were at too low a level of exhaustivity, the gradual increase in
the level, up to an average of 33 terms, brought about an improvement in
performance. However, the higher level of exhaustivity represented by the
abstracts (probably about 60 terms per document) was too high, resulting in
the retrieval of large numbers of additional non-relevant documents, so that
the performance only represented a slight improvement on that obtained with
titles. This hypothesis is supported by the effect with titles and abstracts
of enlarging the classes by the use of word forms. With titles, where it
has been shown that the level of exhaustivity is too low, the use of word
forms improves the normalised recall ratio from 58.94% to 59.76%. With
abstracts, however, no such improvement is noted; already there are too
many terms and the use of word forms results in a fall from 60.94% to
60.82%. Admittedly this in itself cannot be considered a significant change,
but taken in the context of the other results, appears to be of some import-
ance.
The compilat.ion of the dictionaries or schedules was done, in the
main, by Mr. Jack' Mills. Although there can be few people more comp-
etent in such work, there can obviously be no guarantee but [OCRerr]hat different
classes in the Single Term index languages might have given an improved
performance as "compared to natural language. However, i{ seems unlikely
that the classes p'repared for the Simple Concept index languages could have
been solely responaible for the relatively poor performance as compared to
the Single Terrr{ index languages. With the Controlled Term index languages,
the classes of terms were .formed on the basis of groupings given in the
Thesaurus of Engineering Terms of the Engineers Joint Council, yet the use
of any groupings except Narrower Terms {Index Language III. 2. a) resulted
in a loss of performance.
In Chapter 3, the statement was made that for any given question, the
total number of postings of the search terms of that question must be equal
to the total number of retrievals at the various coordination levels. To
explain this po[OCRerr]t with a simple example, assume the search programme is