MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Assignment Indexing Techniques
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Table 2. Summary 0£ Automatic Assignment Indexing Test Evaluations
Investigator
Principles and Methods
Materials
Used
Tests
Remarks
[OCRerr]aron
0
Statistical probabilities of
association between clue
words and pre-established
subject categories. Source
items manually indexed to
32 categories. A subclass
of words occurring in the
corpus selected as clue
words, and statistical cor-
relations obtained for 90
such words with categories
assigned. Correlation data
and Bayesian probabilities
used to assign categories
to new items.
Corpus of 405
items selected
from computer
abstracts,
PGEC, 1959.
Full text,
20, 000 words
of which 3, 263
were different
words.
For 260 source items, 12 did not
contain any clue words, 247 were
indexed, 1 contained an error
preventing processing. For the
247 source items indexed, pro-
bability of top-ranked category
being correct 84.6%. For 145
new items, 20 not indexed be-
cause they contained no clue
words. In 85 cases where at
least 2 clue words occurred,
probability of correct category
assignment = 51.8%.
Considerable ma[OCRerr]
inspection and jud
ment involved in
selection of clue
words. Some ne[OCRerr]
items cannot be p
cessed, because I
contain no clue w(
3orko Factor analysis to determine Psychological Factors selected were judged Some new items C
distinctive grouping of clue abstracts. 618 to be compatible with but not not be processed,
words. Word frequency abstracts, identical to subject classif- because they cont
counts made, 90 of the 2.0 50, 000 text ication terms used for these no clue words.
most frequent non-common words; 6, 800 items by the American
words manually selected. different Psychological Association.
Correlation matrix com- words.
puted, factors rotated and
_____________ interpreted. _________________ ________________________________________________________