ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
S0CCER - A Concordance Program
chapter
Guy E. Hochgesang
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
III-~
the context in which it occurs, and the number of the card
in which it is included. If a restriction or selection
list has been specified by the use of a RESTRICT or SE[OCRerr]CT
control card, this list is scanned before the token is written
on the intermediate tape. If the RESTRICT card is used, tokens
appearing on the restriction list are not written on SMRTAP;
and if a selection list is specified by a SEI[OCRerr]CT card, only
those tokens appearing on the selection list are written on
SMRTAP. Thus the use of a restriction list provides a method
for excluding c[OCRerr]nmon or unwanted words (ttthe1t, 11oft1, 11andt1 etc.)
from the concordance, while the use of a selection list enables
one to include only certain words in the concordance, excluding
all others. Only one of the two types of lists may be active
during a single run. It is permissible not t[OCRerr] use either type
of list, i.e., to use neither the RESTRICT or SE[OCRerr]CT control
card.
When S[OCRerr]CCER hits a card with [OCRerr]l*ST[OCRerr]p:I left-justified in columns 1-6,
or an end-of-file on the INPUT tape, the assumption is made that the entire
text has been processed. The tokens appearing on SMRTAF are then sorted into
alphabetical order, using the scratch tapes, as described in Part 3. After
the sort of SMRTAP some control information about the sort is written on the
[OCRerr]UTPUT tape. This is further described in Part 6B. The concordance is then
written on the [OCRerr]UTPUT tape following the complete listing of the text already
produced. The concordance consists of an alphabetical listing of the tokens
on SMRTAP, along with the context in which the tokens originally occurred,
and, the number of the card from which the tokens were taken. The number of
occurrences (i.e., number of tokens) of each type is also given. Figures 1
and 2 show fragmentary examples of a typical text listing and concordance.
While writing the concordance on the [OCRerr]UTPUT tape, S[OCRerr]CCER keeps track
of some useful statistics, which are written out following the concordance