MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Taking more text as the basis for automatic derivative indexing adds, of course, the
problems and costs of keystroking additional input material. At the same time, most of
the major problems of scatter of references, synonymity, redundancy and exclusive
reliance on the author's own language and terminology not only remain but may quite
probably be intensified. The problems of establishing suitatle rules for selection of
significant words are aggravated, not only by the far larger number of different words to
be processed, but because of unresolved problems in effectively relating length of index
and depth of indexing to the length of the document. 1/
There are, however, a number of practical suggestions by which machine augmenta-
tion of titles might be accomplished. First is the invariant selection of words that are
capitalized, other than those that begin a sentence. As Wyllys points out, this type of
selection criterion would emphasize proper names, and these in turn might be particularly
valuable clues, especially in a military intelligence situation. 3/ It has also been
suggested that the selection criteria should depend on particular pre-specified contexts,
such as being preceded by the words: "the results were...,", "in conclusion ...", and
the like.
A second type of machine selection procedure is the converse of the exclusion or
stop list, namely, an inclusion list or dictionary which may involve especially significant
words for a particular subject matter area or words that are of importance to a particular
organization. In the discussions of the Area 5 ICSI papers it was remarked:
"Another complication is that mechanized indexing finds in a paper what was
important to the author. What happens if there is something in the paper not
important to the author but of importance to the indexer? One possibility is
to have a list of words and phrases expressing the interests of a particular
collection, which the machine looks for in the papers. If this word or phrase
occurs even once, it should be picked up as an indexing term." 4/
1/
2I
3/
4/
See, for example, Wyllys, 1963 [653], p.22.
See Luhn, 1959 [371], p. 52; [384], p. 8.
Wyllys, 1963 [653], p.15.
[OCRerr]ee Ref. [578], p. [OCRerr]263. See also, among others, Luhn, l9[OCRerr]9 [37[OCRerr]] , p. 52: "Just as
common words have been eliminated by look-up in a special index, certain essential
words may be looked up in another special index for the purpose of listing them under
any circumstances".
70