MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
1/
th[OCRerr] southwestern states.- A joint American Bar Foundation--IBM research program has
been established to explore both text searching without prior indexing and automatic in-
dexing techniques (Eldridge and Dennis, 1962 [183], 1963 [182]).
In the Horty-Pittsburgh System, approximately 6, 000,000 words of text have been
converted via Flexowriter to magnetic tape. An exclusion dictionary of 100 words is used
to eliminate the most common words and a word-concordance is prepared, resulting in
word-occurrence location indicia by position in sentence, paragraph and section of the
statute. In searching, the user has available to him the alphabetized list of approximately
17,000 different words and it is up to him to think of the words and synonyms most likely
to occur in statute sections likely to be the ones he seeks. Several search logics are
available. One provides that at least one of a group of alternate words must appear;
another requires that at least one from two or more groups must appear in the same
sentence. Intra-sentence distance criteria are also utilized: "If the phrase `born out of
wedlock' is sought, the operator... requires that the word `wedlock' appear in the same
sentence, no more than three words after `born'."
Obviously, for the same question the searcher would also have to specify synony-
mous words and phrases- -"illegitimate children", "illegitimate births", "unwed mothers",
"unmarried mothers", "illegitimacy", "bastardy", and so on. The reported success of
the system is apparently due in large part to the ingenuity of the searchers in specifying
the expressions and synonyms most likely to be used. Hughes comments as follows:
"It should be noted that this system will be most efficient only whe'n the users
are thoroughly familiar with the linguistic style of the source material and
search is made on words known to occur in the appropriate statutes" 3/
6.5 Other Examples of Related Research in Linguistic Data Processing
Since, as Garvin has emphasized, "All areas of linguistic information processing
are concerned with the treatment of the content, rather than merely the form, of docu-
ments composed in a natural language," [OCRerr] much of the research in linguistic data
processing is potentially applicable to both the development and the improvement of
automatic indexing techniques. Thus developments in automatic content analysis, in
psycholinguistics, in question[OCRerr]answering systems, may eventually find application to
mechanized indexing systems.
1/
2/
Horty, 1962 [278], pp. 59-60.
3/
Eldridge and Dennis, 1964 [182], p.
Hughes, 1962 [284], p. IV-6 to IV-8.
90; Wilson, 1962 [645].
136