CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Test Environment chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 8 - SIMPLE CONCEPTS Collection size Total terms in vocabulary Average posting per document 200 documents 2,798 18 CONTROLLED TERMS Collection size 200 documents 350 documents Total terms in vocabulary "816 985 Term[OCRerr] in E.J.C. Thesaurus 694 827 Additional terms122 158 Added lead-in vocabulary terms 1,285 1,514 Average postings per document 24 24 FIGURE 2.3 DATA CONCERNING USAGE OF TERMS IN SIMPLE CONCEPT AND CONTROLLED TERM INDEX LANGUAGES COLLECTION SIZE Total postings of all words Total postings of words less those on restriction list Distinct words on restriction list Distinct words not on restriction list Average postings of all words per document Average postings of words not on restriction list per docu- ment 200 abstra[OCRerr]s 33,042 14,783 204 3,123 165 74 First ten terms ranked by usage FLOW NUMBER MACH PRESSURE RESULTS WING' EFFECTS SHOCK BOUNDARY LAYER FIGURE 2.4 DATA CONCERNING USAGE OF WORDS IN ABSTRACTS