MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
1/
construction and revision of a mechanized thesaurus, as again Luhn has suggested. -
Schultz suggests that machine records should be maintained of what thesaurus terms are
actually used for indexing and searching, the frequencies of term usage, the co-
occurrences, the number of items described by particular combinations of terms and the
like 2/
The potential combinations of natural text processing, automatic indexing, and
thesaurus construction and updating are stressed in many current programs. For
example, Eldridge and Dennis discuss:
"Indexing by machine from natural text in a fully automatic system, in which
statistical analysis of the words is employed as a device for (a) building auto-
matically a `concept' thesaurus, (b) indexing incoming documents with reference
to the thesaurus, and (c) continuously revising the thesaurus to reflect new word
usages in currently incoming documents."
Similarly, Giuliano and Jones suggest that given a term-term statistical association
matrix, a transformation can be arrived at with a unit vector assigning value only to
index term Z that ranks every other index term according to degree of association with Z,
then by listing the higher ranked terms for each term Z, "a `thesaurus' listing can be
obtained completely automatically." 44
6.2 Statistical Association Techniques
A special definition of the word "thesaurus" might, as we have noted, include the
development of devices and techniques which either automatically or by man-machine inter-
action serve to suggest the amplification of a set of index terms. We shall briefly con-
sider here both devices that visually display associations between words, terms, and
documents 3/ and techniques for machine use of coefficients of correlation for prior co-
occurrences in a collection of word-word, word-term, term-term, term-document, and
document-document associations, the statistical association factor technique as first
developed by Stiles.
1/
Luhn, 1957 [385], p. 316: "Provision should be made to register the number of
times each word is looked up in the index and the number of times each family
number has been used for encoding. Such a record would be an indispensable
part of the system for making periodic adjustments based on the usage of words
or notions as mechanically established."
2/
3/
4/
5/
Schultz, 1962[529], p. 104.
Eldridge and Dennis, 1962 [183], p. 6.
Giulianoandjones, 1962[229], p. 12.
It should be noted that Tabledex, the Scan-Column Index, and similar tools pro-
vide to some extent a display of prior associations between index terms. (See
pp. 25-27 of this report.) Thus Cheydleur (1963 [113], p.58) rerriarks: "Ledley.
has focussed on inter-item concepts in designing his economical TABLEDEX
arrangement for displaying the connectivity of index terms and related file items."
[OCRerr]I8