ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
S0CCER - A Concordance Program
chapter
Guy E. Hochgesang
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
III- 18
a[OCRerr]signment of a character to one of these sets, and does not wish to use the
ALPH or SPEC control cards for each run of [OCRerr]CCER, these tables may be
changed. Such changes are made by changing the transfer address for the
appropriate character.
B. Timing
The time taken by S[OCRerr]CCER to process a text is dependent on the
length of the text, the number of tokens included in the concordance, the
tape assignments, and the order of the merge for the tape sort. In general
it is faster to have the input tape on channel A and to use the highest
possible merge order for the sort.
If one considers the merge order to be constant, then the processing
cime is roughly linearly dependent on the number of tokens in the text. Most
of the runs of S[OCRerr]CCER to date have used a merge of order four and a restriction
list which excluded between [OCRerr]O and [OCRerr]O per cent of the tokens from the con-
cordance. Under these circumstances a text of 33,O[OCRerr]2 tokens (4318 cards)
took 12.7 minutes of execution time, producing 597 pages of output. A text
containing 113,130 tokens (13,356 cards) produced 1770 pages of output in
36.9 minutes of execution time, listing 7925 types containing a total of
58,190 tokens. The remaining 5[OCRerr],9h0 tokens were included on the restriction
list.