MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Additional features include provision for the matrix of coefficients of association
to change with time or with deliberate manipulation to improve performance. Thus:
"Each normalized cell weight. . . rises and falls with time as each specific
association increases or decreases in relative frequency. In this way, the
matrix memory of associations changes with time, maintaining a cumulative
pattern of associations reflecting one statistical characteristic of messages
fed into it in the past...
"In addition to this adaptive characteristic of changing memory with time and with
changing inputs, the matrix is also readily subject to formal education. Any
specific cell weight can be strengthened by repeatedly reading into the matrix
memory the specific strings that contain the desired associations. For example,
by introducing the strings is am, is are, am is, am are, and are am, we can 1/
1-ncrease the statistical tendency of the tokens is, am, and are to be associated." -
Experimental results have been obtained for a corpus of 500 bibliographic entries
contained in DDC's Title Announcement Bulletin. In the case of a three-term query, 40
items were selected and ranked in probable relevance order, with selection based on a
particular relevance score value threshold. The investigators then reviewed the abstracts
of all 500 items and rated them as to relevance with respect to the query. Seven
additional items were found, of which three would have been machine-selected with a
le[OCRerr]s stringent selection threshold. For the remaining four, it is reported that they "were
pcorly indexed and could have been judged not relevant by a human who depended upon the
descriptor string only, as the matrix did, rather than upon review of the abstracts." 2/
6.3 Clues to Index-Term Selection from Automatic Syntactic Analysis
Several of the organizations and research teams most active in the investigation
of linguistic data processing techniques, especially for automatic indexing, extracting
and search renegotiation applications, are actively considering the use of clues derived
from automatic syntactic analysis to improve criteria for machine selection of
significant" words, phrases, and sentences from raw text. Such approaches, in general,
however, are subject to the limitations of non-availability of sufficient corpora of text
in machine-usable form, in the first place, and, even more importantly by the non-
availability of satisfactory computer programs for complete syntactic analysis up to the
1/
2/
Spiegel et al, 1963 [566], p. 17.
Ibid, p. 34.
127