MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Automatic Assignment Indexing Techniques chapter Mary Elizabeth Stevens National Bureau of Standards Table 2. Summary 0£ Automatic Assignment Indexing Test Evaluations Investigator Principles and Methods Materials Used Tests Remarks [OCRerr]aron 0 Statistical probabilities of association between clue words and pre-established subject categories. Source items manually indexed to 32 categories. A subclass of words occurring in the corpus selected as clue words, and statistical cor- relations obtained for 90 such words with categories assigned. Correlation data and Bayesian probabilities used to assign categories to new items. Corpus of 405 items selected from computer abstracts, PGEC, 1959. Full text, 20, 000 words of which 3, 263 were different words. For 260 source items, 12 did not contain any clue words, 247 were indexed, 1 contained an error preventing processing. For the 247 source items indexed, pro- bability of top-ranked category being correct 84.6%. For 145 new items, 20 not indexed be- cause they contained no clue words. In 85 cases where at least 2 clue words occurred, probability of correct category assignment = 51.8%. Considerable ma[OCRerr] inspection and jud ment involved in selection of clue words. Some ne[OCRerr] items cannot be p cessed, because I contain no clue w( 3orko Factor analysis to determine Psychological Factors selected were judged Some new items C distinctive grouping of clue abstracts. 618 to be compatible with but not not be processed, words. Word frequency abstracts, identical to subject classif- because they cont counts made, 90 of the 2.0 50, 000 text ication terms used for these no clue words. most frequent non-common words; 6, 800 items by the American words manually selected. different Psychological Association. Correlation matrix com- words. puted, factors rotated and _____________ interpreted. _________________ ________________________________________________________