MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards for purposes 0£ identifying document contents and to use data on the joint occurrence of words in the same sentence or similar contexts as grouping criteria. Clark points out in particular that the use of ordered pairs and longer sequences of words to express a single concept may be highly characteristic of the special technical language used in a specific subject field, and notably those of the social sciences. 1/ Others who have explored word n-tuples as selection criteria for automatic extraction operations include such investigators as Szemere, Levery, and Yakushin. Szemere reports an investigation of 39 Swedish patent specifications in the field of switching circuits looking for significant word-pairs, with emphasis on noun-adjective combinations (1962 L591J) The objectives of a project headed by Levery at IBM - France have been reported as follows: `1A series of experiments is planned in the fields of automatic indexing of technical texts and technical vocabulary analysis. "A statistical method will be tested to determine the degree of closeness in meaning of words. The method will consist of studying the pairs of words which appear together in the majority of texts and calculating a coefficient of corre- lation from the frequencies. Such work will result in a standard list of notions frequencies for a particular kind of information. "Starting from this list, new experiments will be made so as to obtain a list of keywords representing each text. The method will use statistical comparison between the distribution of frequencies of notions contained in a text and the standard distributions obtained for the entire corpus." 2/ Yakushin(1963 [654[OCRerr]) develops a variation of the word-pair principle in which he looks for those pairs where the words are, or suggest, names of objects, such as 11table-leg'1. He suggests, further, that so-called `1basis nouns" can be established for a given scientific field and entered into an inclusion dictionary, which also contains codes for the lexical classes to which the word can belong and codes for determining whether or not the word can join with another as a "basis term". Machine routines are then suggested to develop whether or not given terms are jointly part of the same text, whether one textually precedes another in a given text, whether or not there is a "nomenclator" pair. Depending upon the frequency of occurrence of identical or semantically related nomenclator constructions, it is claimed that subject concepts can be detected. That is: "The method is founded on the finding in a text of so-called basis terms, established by list, and of the words which explain them. These explanatory words, which in different contexts refer to one basis term, are grouped and ordered according to definite rules into a subject concept." 3/ 1/ 2/ 3/ Clark, 1960 [123], p.460. National Science Foundation's CR&D report no. 11, [430], p. 118. Yakushin, 1963 [654], p.16. 80