MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Automatic Assignment Indexing Techniques chapter Mary Elizabeth Stevens National Bureau of Standards The choice of 90 clue words in Borko's work with abstracts in the field of psycho- logical literature was apparently dictated by a matrix size which would be convenient for computer manipulation. 1/ However, it happened to coincide with the number of clue words used by Maron in his experiments. Advantage was taken of this coincidence to obtain comparative data on the performance of the two assignment[OCRerr]indexing techniques as applied to the same material. The 260 computer literature abstracts used by Maron1 as source documents were processed to derive a correlation matrix for Maron's 90 manually selected words, which was then factor analyzed. Several sets of factors were extracted, rotated, and the results studied, with a final selection of 21 categories Since these automatically derived categories did not coincide with Maron 5 original 32, it was necessary to analyze manually the total group of 405 abstracts (260 "source" and 145 "test" items) and assign them to the new categories, then to study the documents falling into each factor-analytically derived category to determine which of Maron's 90 clue words were category-indicative, and finally to substitute these words in the Bayesian equation used by Maron so as to predict which of these classification categories his probabilistic method should obtain. The same two sets of 260 "source" and 145 "new" abstracts used by Maron were then submitted to the computer assignment program which compares the clue words of a new item with the numeric values of the predictor words for each factor category, then com- putes the score for each item in all categories, and assigns the category with the highest score to the item. For the source items, Borko and Bernick's results showed 63.4 percent correctly classified, by comparison with the 84.6 percent correctness score originally obtained for them in Maron' 5 experiments. For the new items the factor analysis method scored 48.9 percent correct assignment by comparison with Maron's original 51.8 percent. [OCRerr] The later investigators therefore concede that the performance of Maron's technique was somewhat superior for the same items using the clue words originally selected by Maron. Further experimentation was then carried out (Borko and Bernick, 1963 [78]) using word frequency data for the selection of a new set of 90 clue words and a classification scheme for 21 categories was again automatically derived. The 405 abstracts were again manually classified to these machine-derived categories by five subject-matter specialists and the two investigators. Comparative data were then obtained for both the Maron assignment formula and the modified classification system assignments in terms of agreement with the manual assignments. For the source items, the percentage of machine assignments agreeing with those made by people was 62.7 when the Bayesian probability formula used by Maron was applied and 61.2 for the factor analysis score system. For the new items, the corresponding correct percentages were 57.9 and 55.9. Additional data compared the effects of using the original Maron words and the frequency-based word set (Borko's words) for the same probability formula assignment method. While there was an overlap of approximately 50 percent between Maron's words and Borko's words, the findings indicated that: 1/ 2/ Now increased to 150 x 150. BorkoandBernick, 1962[72], pp. 9-10. 96