IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Experiment in Automatic Thesaurus Construction
chapter
R. T. Dattola
D. M. Murray
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
viii-16
THS 1 THS 2
Total number of concept classes............ 156 289
Avg. number of concepts per class.......... 3.9 1.4
Number of concepts appearing in more than
one concept class 167 42
Number of concepts appearing in more than
six concept classes 3 0
Avg. number of classes per concept . . 2.1 1.2
Avg. standard deviation (S.D.) of concept
frequencies per concept class 3.9 1.4
n
avg. S.D. = l/n[OCRerr](l/m [OCRerr]jA - f[OCRerr]I) where,
j=l
= total number of concept classes
m = number of concepts in concept class j
A = avg. frequency of concepts in concept class i
f. = frequency of concept j in concept class i
Statistics on Automatic Thesauruses
Fig. 5