CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Test Environment
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
8 -
SIMPLE CONCEPTS
Collection size
Total terms in vocabulary
Average posting per document
200 documents
2,798
18
CONTROLLED TERMS
Collection size 200 documents 350 documents
Total terms in vocabulary "816 985
Term[OCRerr] in E.J.C. Thesaurus 694 827
Additional terms122 158
Added lead-in vocabulary terms 1,285 1,514
Average postings per document 24 24
FIGURE 2.3 DATA CONCERNING USAGE OF TERMS IN SIMPLE
CONCEPT AND CONTROLLED TERM INDEX LANGUAGES
COLLECTION SIZE
Total postings of all words
Total postings of words less
those on restriction list
Distinct words on restriction
list
Distinct words not on restriction
list
Average postings of all words
per document
Average postings of words not
on restriction list per docu-
ment
200 abstra[OCRerr]s
33,042
14,783
204
3,123
165
74
First ten terms ranked by usage
FLOW
NUMBER
MACH
PRESSURE
RESULTS
WING'
EFFECTS
SHOCK
BOUNDARY
LAYER
FIGURE 2.4 DATA CONCERNING USAGE OF WORDS IN ABSTRACTS