MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Assignment Indexing Techniques
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Discriminant coefficients were then computed at both the major and minor levels for
all words occurring in the sample items falling into one of the 20 groups in accordance
with the formula:
"The discriminant coefficient is:
= [OCRerr]n (P[OCRerr][OCRerr] -
13
3 P
i3.
Where:
m
P.. = f. I [OCRerr]
13 13
1
and
The relative frequency of the ith word
in the jth category.
n
= 1
- [OCRerr] P.
13 n ii
3
The mean relative frequency [OCRerr]er
category of the ith word. 11
These coefficients are used both to set up threshold values to determine which words
should be used in the assignment formulas and to assign weighting factors to the words
themselves.
The results of the experiments to date are based on 83 items from the "reference
set" which were not used as source items. For 63 items, 78 percent were correctly
classified at the level of a single major category (e.g. , "Programming", `,11ardware
Design") and also correctly classified at a single subcategory level, (e.g. , "Program-
ming Languages", "Semiconductor Devices"). The 20 remaining items were classified
to one major category with an accuracy of 95 percent and to two minor level subdivisions
with accuracies of 60 percent and 75 percent. Additional investigations were made on
the effects of using a discrimination threshold to eliminate insignificant words from
consideration and on the use of weighting factors in the assignment calculations.
4.5 SADSACT
Stevens and Urban at the National Bureau of Standards (1963 E 569, 570]) have also
explored an automatic indexing technique that uses, as in the experiments of Williams,
a teaching sample or refer[OCRerr]nce set Qf previously indexed items to form patterns of word
and index-term assignment associations. However, there are much less formal require-
ments for computing correlation coefficients and no consideration is required of either
1/
Williams 1963[642], p. 163.
98