MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Operational Considerations chapter Mary Elizabeth Stevens National Bureau of Standards Moreover, to date, very little material in the scientific and technical literature is available in this form. As of 1961, it was reported that a survey by McGraw-Hill indicated that only about 2 or 3 percent of the publications in the United States were then prepared by typesetting tape, that most of this was in the form of Monotype tape which because of its 30-column width and special format is not generally compatible with tape reading equip- ment, and that tapes had many errors in them which would require considerable effort to correct. 1/ As of late 1963, Bennett reports: "Computer processing of natural language text material requires that a body of data be available in machine-readable form. At present such a body of data results only from a direct human copying process. An inquiry into existing transcriptions of text which were machine-readable showed that they were abbreviated both interms of completeness and in number of symbols represented. As an alternative text pro- duced as a by-product of typesetting operations is clearly an eventual possibility, but present practices make the detection of unit delimiters such as ends-of-sentences difficult. " In the future, both machine-usable text from publishers and printers and the similar- ly machine-usable paper tape produced as a byproduct from the original keystroking of manuscript on such equipment as Flexowriters and Justowriters may alleviate this problem for new items. Nevertheless, the wealth of the world's present literature, the informal and unpublished technical reports of high current interest but limited initial distribution, and material acquired from foreign sources, will continue to pose for the foreseeable future major problems either of automatic reading of the printed page or of human re- transcription at high cost. While there have been many promising developments in automatic character recog- nition techniques, the devices that are now available for production use are limited to small character sets, such as a single alphabet in a single font, often of special design. The multi-font page reader is not only not yet commercially available but may not become so for some years to come. Even if it were, there are many unresolved and as yet in- completely specified problems involved in the development of suitable rules for the machine so that it can distinguish between title or page number and text, figure caption and text, author's name in a cited reference and the title of the paper cited, and the like. A case in point, not only for automatic reading equipment of the future but for machine processing of machine -usable material available today, is the difficulty of machine recognition of punctuation marks as used for different purposes. 3/ In the absence, then, both of scientific and technical documents already in machine language form and of character recognition equipment capable of reading the printed page, we are left with the unsatisfactory situation of re-transcribing input material either by use of a tape typewriter or by keypunching to punched cards. That this situation is un- satisfactory and is a major bottleneck in machine processing of text in excess of the bibliographic citation data only is evidenced by such typical statements as these: 1/ Cornelius, 1962 [140], p. 47. z/ Bennett, 1963 [50], p. 141. 3/ See Bennett quotationabove; Luhn, 1959 [384], p. 22, andcoyaud, 1963 [143]. 166