MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards "[In the frequency matrix[OCRerr] . . the diagonal elements . . give the total frequency of an index term and the off-diagonal gives the frequency of co-occurrence of two terms. The diagonal of the `context' matrix represents that portion of the total vocabulary with which an individual term has been coordinated, and the off-diagonal the extent to which two terms have common context. . . Such matrices give a basis for examining the extent to which terms are generic or specific within the context of the collection of documents. One can speculate that terms occurring with high frequency and wide context, i.e., with frequencies distributed amongst all or nearly all off-diagonal elements of the matrix are of such broad connotation as to be indifferent discrimina- tors of content . . . The frequency and context matrices can again be used to deter- mine the modifiers with which they can most r[OCRerr]eaningfully be coupled for the collection of documents being considered. 11 11 Finally, Baxendale notes that on the basis of her studies it should be possible to select quasi-subject headings based on frequency counting criteria, but then to order the remaining vocabulary of selected terms according to contextual measures of association which are semantic, syntactic, or statistical in nature. Fxperimental results for a collection of 1, 500 documents included semantic associations between "searching" and "retrieval", syntactic associations of "machine" or "literature" with "retrieval", and the apparently misleading association of [OCRerr][OCRerr]metal[OCRerr]! with `1retrieval" which, however, had statistical significance within the particular document sample. 2/ Other investigators who have explored noun-adjective clues for selection include Anger, Chonez, Langleben and Shumilina, and Swanson. Anger looked for relationships indicated by syntactic dependencies or by noun-adjective and adjective-adverb linkages, and gave in an appendix a suggested program for phrase inversions. 3/ Chonez has described a computer program which by recognizing "separating" words, especially prepositions, and applying "pseudo-grammatical" rules compiles an index to English language items in the fields of ionized gas physics and thermonuclear fusion. It is claimed that: "The subject index thus prepared is similar in presentation to Luhn's KWIC indexes, but is fundamentally different in conception and is in fact intermediate between... (this) ... and the conventional alphabetic subject indexes." 4/ Langleben and Shumilina are concerned with machine-aided procedures for trans- lation from natural language materials to an intermediary or documentation language. 1/ 2/ 3/ 4/ Ibid, pp.215-216. Ibid, pp. 216-217. Anger, 1961 [151 pp. III-6ff. Ghonez, et al, 1963 [119], p. 31. 74