MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Operational Considerations
chapter
Mary Elizabeth Stevens
National Bureau of Standards
8. OPE[OCRerr][OCRerr]TIONAL CONSIDERATIONS
Whatever the verdict of evaluation of one or more automatic indexing techniques,
whether of the derivative, modified derivative, or assignment type, there are certain
operational considerations and problems that typically affect any attempt to apply such
techniques in actual production operations. These considerations, which also affect lin-
guistic data processing operations in general, include input considerations, availability of
methods or devices for converting text to machine-usable form, programming consider-
ations, questions of format and content of output, and problems of customer acceptance of
the machine products.
8.1 Questions of input
Input considerations include, first, questions of the extent and availability of mate-
rial which can be handled directly by the machine. This may be limited to title only, to
title plus abstract, title plus other material, 1/ preselected text or automatically gener-
ated extracts; or it may in a few cases extend to full running text. Possible future re-
quirements may extend to the processing not only of full text but of interspersed graphic
material (equations, charts, diagrams, drawings, photographs) as well.
We have considered typical arguments for and against the limitation of input to titles
only, to augmented titles, and to abstracts in other sections of this report. The points to
be emphasized here are requirements for pre-editing or post-editing, provisions for error
detection and error correction, the time and cost requirements of conversion equipment if
material is not already available in machine-usable form, and the like. As Cornelius
suggests:
"Present day computers, if used for machine indexing, will be generaUy input
limited and will require excessive data preparation. Causes of these limitations
are: time required for translation to machine language, verification of this ma-
chine language, and the capability or lack of capability of correction in the input
media." 2/
Examples of pre-editing requirements, even for the simple case of keyword-in-
title indexing, include the spelling out of chemical symbols, the encoding or the omission
of subscripts and superscripts, insertions of hyphens to prevent indexing of a word, and
substitutions of blanks for hyphens in compound words to assure indexing of each com-
ponent. 3/ For full text, a far more extensive and elaborate set of rules and conventions
must be developed and applied. 4/ Other editing may be required for format standard-
1/ This may specifically include cited titles, as suggested variously by Bohnert, 1962
[69], p. 19; Giuliano and Jones, 1962 [229], p. 10; Swanson, 1963 [580], p. 1;
Gallagher and Toomey, 1963 [205], p. 53; and as used in the SADSACT method, see
pp. 98-99 0£ this report.
2/ Cornelius, 1962 [140], p. 42.
3/
See, for example, Kennedy, 1961 [311], p. 120.
4/ See, for example the sophisticated proposals of Nugent, 1959 [441], and Newrnan
et al, 1960 [439].
164