MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards and Youden's indexes to ACM papers (1963 [659] and [660]) illustrate single-column formats that alleviate this problem by extending the title line to 103-106 characters, ex- clusive of the identification code. Youden has calculated that for the titles in the field of computer literature which he analyzed 30 percent of the titles would have been truncated in 60-character title line formats, but that only 2 percent would have been chopped by 103- character title length limits. 1/ A second disadvantageous effect of machine production requirements in most KWIC indexes is the tedious sequential scanning necessary because of the unbroken organization of the page format and the long blocks that occur for frequently occurring word entries. Doyle (1959 [168], 1961 [166]) has investigated this problem of block length and suggests either that alphabetization be carried out to the 1words following those in the indexing window or that the entries in the block be permuted also in a second-order cycle. The latter suggestion has the advantage of facilitating any two-term coordinate indexing type of search, "because one can now look up directly any pair of subject words, regard- less of whether or not they occur adjacently in a sentence.' 2/ Redundancy in KWIC indexes, which aggravates the sequential scanning and the long- block fatigue effects, is in large part the result of difficulties in establishing the most appropriate bounds for exclusion or "stop" lists. We have previously distinguished machine-generated indexes of the derivatiy e type from certain of the machine-compiled indexes primarily on the basis that in the first case, the criteria for determining the significance of the keywords to be used as the index access points are applied auto- matically during the machine processing, even if the selectivity so achieved is only "negative selectivity. " The amount of index entry redundancy, of too many entries and of irrelevant entries is, in simple KWIC indexing, a direct function of the length and contents of the stop list. In Luhn's original proposals for both KWIC and other types of automatic indexing, he pointed out the importance of the rules which must be established in order to differentiate the significant words from the nonsignificant. He says, for example: "Since significance is difficult to predict, it is more practicable to isolate it by rejecting all obviously nonsignificant or `common' words, with the risk of admitting certain words of questionable value. Such words may subsequently be eliminated or tolerated as `noise'. A list of nonsignificant words would include articles, conjunctions, prepos4tions, auxiliary verbs, certain adjectives, and words such as `report','analysis', `theory', and the like." 4/ 1/ 2/ 3' 4/ W. W. Youden, 1963 [458], p.331. Doyle, 1961 [166], p. 13. Artandi, 1963 [zo], p.15. Luhn, 1959 [381], p.289. 64