MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Appendix B: Progress and Prospects in Mechanized Indexing appendix Mary Elizabeth Stevens National Bureau of Standards Williams found an average correct classification of 62 percent for 474 test items automatically assigned to one of four solid state categories 28/. In other tests, 2, 754 solid state abstracts were classified into three primary and three secondary categories, using a computer program capable of handling up to 50 clue words, 10 subject categories, and any number of documents. Performance effectiveness ranged from 62 to 88 percent correct by comparison with the original classifications at the more generic level and from 67 to 92 percent correct at the more specific level. Further progress in the application of statistical association, clumping and syntactic analysis techniques have also been reported Statistical association techniques are concerned with correlations and coefficients of similarity assumed to exist between items or objects sharing common properties. In documentary item applications, document- document similarities are calculated for sharings of the same index terms or for common patterns of citing the same references, of being cited by the same other documents, and the like. Word-association techniques include the development of absolute or relative fre- quencies of co-occurrence in a given set of documents, such as those representative of a specific subject matter field. Various normalizing procedures can be used to remove effects of tendencies for certain words to occur frequently in general. Spiegel and asso- ciates 38/ at Mitre Corporation have explored means of normalization to eliminate effects of length of text strings, relative positions of words in a string, and vocabulary size. Ernst 39/ reports that at Arthur D. Little: "We are ... seeking to provide a working retrieval system which will incorporate associative features. The objective will be to make use of automatically computed index term associations as a basis for detecting and presenting an appropriate list of near-synonyms for the concepts desired by a user essentially the automatic generation of a limited thesaurus in response to individual user requests." In Switzer's model 40/, co-occurrence statistics of index terms consisting of words from title or text, author's names, and words and author names from cited titles, are used. Significant probabilities for such co-occurrences are then derived. Methods that group objects or items in terms of co-occurrence data for their prop- erties or characteristics are involved in the "clumping" techniques as proposed at the Cambridge Language Research Unit. Further investigations into the development of the basic CLRU approach have been conducted at the Linguistic Research Center at the Univer sity of Texas, by Dale and others 30, 41/. In this work, simulation of associative doc- ument retrieval by computer gave results for 260 computer abstracts, using the same 90 clue words as previously used by Borko: "The recall ratios in the test requests were high (i.e. , very few relevant documents were not retrieved); relevance ratios were characteris- tically smaller (of the order of 10 percent). However, since the output lists are ordered, it is interesting to note that the relevance ratios are significantly much higher in the upper portions of the output lists (roughly between 25 percent and 50 percent in the upper fourth of the output lists), and that recall ratios are still of the order of 50-70 percent." In 1964 a report of the Astropower Laboratory 42/ outlined a "semantic space screening model" based on the assumptions that keywords or phrases have quantifiable 1values', that by itemizing the keywords in a document sufficient information is obtained for its classification, and that by adding the values for the keywords in a document the pertinence of that document to a particular subject field can be determined. A training sample consisted of 120 abstracts drawn from six subfields of electrical engineering. Results showed successful classification of source items, using four different classifica- tion formulas, as ranging from 49 to 96.3 percent. Results with test items ranged from 32.9 to 69.0 percent accuracy. 229