CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Formation of Index Languages chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 59 - Collection size Total pos[OCRerr]'ri[OCRerr]s of terms Average postings per document Total unique terms Variations [OCRerr]n exhaustivity Maximum exhaustivity (all weights) Medium exhaustivity (Weights 7] 10) Minimum exhaustivity (Weights 9] 10) 1400 documents 43,857 31.3 3094 Total terms in vocabulary 3094 _ 2668 1816 Average Postings per document 31.3 25.2 12.9 Use of terms Average usage per term Terms used once only Terms used more than once The first ten terms, ranked by usage: 14.2 1169 1925 Flow (942) Pressure (720} Boundary (512} Layer (512) Distribution (442} Theory (400) Velocity (360} Supersonic (352} Mach (344) Equation (312) Variations in vocabulary size (according to different index languages) Language 1 (Natural language, single terms only) Language 2 (Lang. 1 with synonyms confounded) Language 3 (Lang. 1 with word forms confounded) Language 4 (Lang. 1 with synonyms and word forms confounded) Language 7 (Lang. 1 with minimum hierarchical reduction) Language 8 (Lang. 1 with medium hierarchical reduction) Language 9 (Lang. 1 with maximum hierarchical reduction) 3094 2988 2541 2444 1217 796 306 (383 Proper names are not included in the counts for languages 7,8 & 9) FIGURE 5.1 NATURAL LANGUAGE SINGLE TERM DATA