MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Operational Considerations chapter Mary Elizabeth Stevens National Bureau of Standards ization, especially in the case of citation indexes compiled by machine. 1/ O'Connor notes, however, that "the provision of pre-editing information can slow down the keypuncher or typist, increase the chance of mistakes, and require more intelligence or training on the typist's part." 2/ Questions of error detection and error correction apply both to the original text and to transcribed versions if these are necessary. That is, the basic documents themselves may contain typographical errors, misspellings, and the like, and additional errors are bound to occur at all subsequent stages requiring human processing. Wyllys discusses the need for the correction of spelling errors, mentions suggested computer programs for detection, and cites a private communication from Stiles suggesting that the criteria for accepting words as valid be either that they are identified as already being in the system vocabulary or that they occur at least twice in the input item. 3/ Swanson's analysis of the reasons for retrieving irrelevant, and failing to retrieve relevant, material in the case of text searching on the nuclear physics abstracts includes typical data on the effect of errors. 4/ He found, for example, that failures to record hyphenated words, subscripts, superscripts and other special symbols accounted for about 5 percent of failures to retrieve relevant items, and errors in transcription of either text or search instructions accounted for another 3 percent of these failures. Errors in key- punching of the search requests alone accounted for 4 percent of the cases of irrelevant retrievals. By contrast, in the newspaper clippings experiments where the input material was already in machine-usable form transcription errors were not a factor but the input tape itself had many errors. In this special case, however, Swanson reports: "Garbles are not important simply because messages are sufficiently redundant to insure that even if one or two keywords for a given category are garbled, almost invariably others are present." 5/ The news clippings material used by Swanson represents one class of materials that are today initially available in machine-usable form, because the original recording of the message or text resulted in a machine-usable medium, such as punched paper tape. A punched paper tape is produced as the product of many typesetting operations, especially for newspaper and magazine publication, and this will be increasingly true in the future, together with computer-prepared tapes for input to automatic typographic composing equipment. To date, however, equipment to convert from these tapes to the particular machine language of a given computer processing system is largely non-available, is costly, and is highly subject to error. 6/ 1/ See, for example, Atherton, 1962 [25], p. 4; Marthaler, 1963 [399], p. 22. However, at least one computer program has been developed to assist in this pro- cess. See Thompson, 1963 [600], p. 11-1: "The present program takes biblio- graphic citations and automatically arranges then into a standard format in such a way that the various parts of the citation are unambiguously identified. These standardized citations can later be processed by sorting and matching procedures to identify similar citations and to effect various rearrangements. 2/ O'Connor, 1960 [444], p. 8. 3/ Wyllys, 1963 [653], p. 15. 4/ 5/ Swanson, 1961 [586], Appendix. Swanson, 1963 [580], p. 5. 6/ Compare, for example, Savage, 1958 [521], p. 11: "The use of tape as the original input to the process has offered a number of problems which have yet to be solved. One is the occurrence of typographical errors." 165