MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Assignment Indexing Techniques
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Table 2 (cont.)
Materials
Investigator Principles and Methods Used Tests Remarks
Borko and Factor analysis to determine Same corpus as Detailed comparison with Some items canno
Bernick distinctive groupings of clue Maron, 405 Maron's technique. For the processed becausE
words. Maron's 90 clue computer source items, 63.4% were contain no clue wc
words used for word-word abstracts, of correctly classified. For the
correlation and factor which 260 new items, 46.5(70 correctly
analysis. 21 factors used to indexed, and 48.9% were
developed, and items establish correct for those items in
manually re-indexed to factors, 145 which 2 or more clue words
these categories. as new items. occurred.
Swanson Text word lookup against Brief news Machine assignments compared to
clue word lists, construct- dispatches manual subject indexing. For a
ed by careful analysis of available on first batch of 500 items, 569 assign-
sample items to be ex- teletype tape, ments of correct headings, 119
clusively indicative of a wide diversity assignments of irrelevant headings,
0 particular subject heading of topics. and 32 correct headings missed.
Machine assigns a subject From study of The clue word thesaurus was then
heading to an item if any several 1, 000 revised. For 275 additional test
word on its list occurs in items, 24 sub- items, results showed 282 correct
that item. ject headings assignments, 29 irrelevant assign-
established and ments, 1 missed. For total, aver-
word lists se- ages of 17% irrelevant assignments,
lected, averag- 3% missed. For 200 items, mach-
mg approximat- me and manual assignments were
ely one hundred compared with respect to 5 of the
per category. subject categories, with the
775 new items following results:
then tested. Man Machine
Irrelevant 4 25
missed 46 4
___________ _____________________________ correct 75 116