CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Formation of Index Languages
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 59 -
Collection size
Total pos[OCRerr]'ri[OCRerr]s of terms
Average postings per document
Total unique terms
Variations [OCRerr]n exhaustivity
Maximum exhaustivity (all weights)
Medium exhaustivity (Weights 7] 10)
Minimum exhaustivity (Weights 9] 10)
1400 documents
43,857
31.3
3094
Total terms
in vocabulary
3094
_ 2668
1816
Average Postings
per document
31.3
25.2
12.9
Use of terms
Average usage per term
Terms used once only
Terms used more than once
The first ten terms, ranked by usage:
14.2
1169
1925
Flow (942)
Pressure (720}
Boundary (512}
Layer (512)
Distribution (442}
Theory (400)
Velocity (360}
Supersonic (352}
Mach (344)
Equation (312)
Variations in vocabulary size (according to different index languages)
Language 1 (Natural language, single terms only)
Language 2 (Lang. 1 with synonyms confounded)
Language 3 (Lang. 1 with word forms confounded)
Language 4 (Lang. 1 with synonyms and word forms confounded)
Language 7 (Lang. 1 with minimum hierarchical reduction)
Language 8 (Lang. 1 with medium hierarchical reduction)
Language 9 (Lang. 1 with maximum hierarchical reduction)
3094
2988
2541
2444
1217
796
306
(383 Proper names are not included in the counts for languages 7,8 & 9)
FIGURE 5.1 NATURAL LANGUAGE SINGLE TERM DATA