MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
providing for various normalizations that may be applied to compensate for word or
sentence frequency factors. These devices differ from the earlier EDIAC in the
variable weightings provided, in the normalizations that may be applied, and in multipath
interconnections.
When, for example, currents are applied at some of the word terminals, the volt-
ages appearing on any of the other word terminals depend on the strengths of association
between these words and the input words via all direct and indirect paths. The responses
of sentence terminals to the input words of a query similarly depend upon how strongly a
sentence is connected to these words and how strongly it is connected to other words
which in turn are strongly connected to the query words. It is to be noted further that:
"Pulling out or cutting a few randomly selected wires in an ACORN generally
has a surprisingly small effect. . . This insensitivity is of course, explainable
in terms of the multiplicity of indirect and redundant association paths which
remain intact when a direct path is severed... It. . . sug;ests that the retrieval
process can indeed be made insensitive to minor variations in indexing." 1/
In addition, there are intriguing possibilities for imposing a "viewpoint" with
respect to a search by injecting bias currents. Thus if only non-"Air Force" jet
plane items were desired, the "Air Force" items could in effect be grounded out. If there
were no jet items in the collection other than those which were also Air Force items,
these would be indicated as responsive, but largely they would appear only if this should
be the case. Some words used have some connection to almost all other words, but these
have little effect in the system and the hardware thus tends to compensate for the high
frequencies of very general words.
6.2.5 Spiegel and Others at Mitre Corporation
Bennett and Spiegel, reporting at the Symposium on Optimum Routing in Large
Networks, IFIP Congress-1962, 2/ consider modifications to formulas for the calculation
of statistical association factors which will normalize against such influences as frequency
of word occurrences, relative word position within a string of words, and string length.
This work has been carried forward at the Mitre Corporation in a program for developing
procedures to encode various statistical properties of messages or documents and to use
these codes for message routing and retrieval.
Differences between this approach and those of Maron and Kuhns, Stiles, and Doyle,
relate primarily to the questions of how best to normalize. The objective is closely
similar: to use associational weighting so as to provide, in response to a query, output of
documents or messages ranked in order of probable relevance to the query.
1/
2/
Giuliano and Jones, 1962 [229[OCRerr], p. 22.
See Juncosa, 1962 [306], especially paper 4, E. Bennett and J. Spiegel,
"Document and message routing through communication content analysis",
pp. 718-719.
126