MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Assignment Indexing Techniques
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Table 2 (cont.)
Investigator
Principles and Methods
Materials
Used
Tests
Remarks
,tevens and
Urban
Teaching sample for machine
compilation of co-occurrence
data for words in titles and
abstracts with descriptors
assigned to these items.
Words in titles and cited
titles of new items then run
against master list of pre-
vious word-descriptor assoc-
iation to derive descriptor-
selection scores3 highest
scoring descriptors (e.g.3
up to 12) assigned. Assoc-
iations derived for 1, 600
words co-occurring with
any of 70 descriptors pre-
viously assi[OCRerr]ned.
Two teaching
samples, ap-
proximately
100 items each
with 70% over-
lap, drawn
from items in-
dexed byASTIA.
For new items
titles and up to
10 cited titles.
For 59 test items, assignments of
descriptors that had occurred for
at least 3% of the sample items
agreed with ASTIA assignments
58.1%. However, for all des-
criptors assigned by ASTIA, many
not available to machine, overall
machine accuracy = 40.1%. For
20 items, independently evaluated
by several typical users, the
chances that one or more people
would agree with the machine
assignments ranged from 47.1%
when 12 descriptors were assigned
to 75.0% average agreement with
the machine's first choice.
All test items co
processed and ur
different descrip
assigned to each,
some descriptor[OCRerr]
in manual indexir
these items are r
available to the
machine.
~illiams
Discriminant analysis.
Sample items previously
indexed to a 2-level clas-
sification system were
subjected to word fre-
quency counts and the
theoretical frequencies of
the most significant words
in each category were com-
piled. For new items, ob-
served word frequencies'
compared with theoretical
frequencies for each cate-
gory, highest scoring
assigned.
Items from
"Computer
Abstracts on
Cards" index-
ed to 15 major
categories each
divided into 10
minor catego-
ries. 300 ab-
stracts selected
to provide equal
distribution toZO
sub -categories,
5 each in 4major
categories. Add-
itional items for
test similarly
selected.
For 63 new items assigned by
machine to 1 major and 1 minor
category, 78% correct at major
level, 64% correct at minor level.
For 20 items classified to 1 major
and 2 minor categories, 95% cor-
rect at major lev[OCRerr]l, 60% and 75%
correct at the minor level.