ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval S0CCER - A Concordance Program chapter Guy E. Hochgesang Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-2 2. The Concordance A. Definitions [OCRerr][OCRerr]CCEP divides the standard character set into two groups: alpha b[OCRerr]tic characters and special characters. Alphabetic characters are defined tQ be the char[OCRerr]cters included in the concordance, while special characters are those characters to be ignored during the generation of a concordance. A to-en is defined to be any string of consecutive alphabetic characters de1L[OCRerr]flited by special characters, while a type is defined to be a class of identical tokens. As an example if one defines the alphabetic characters as t?e letters of the alphabet, the string of characters to be or not to be contains the six tokens to, be, or, not, to, and be, but it contains only the four types t), be, or, and not B. The Input Text The input text to [OCRerr][OCRerr]CCER should be punched on cards in columns 1-72. Cc)1[OCRerr]T[OCRerr]w..[OCRerr]s 73-80 of the cards are ordinarily ignored and may be blank or contain serial numbers. These cards must then be transferred to the IN?UT tape in unblocked BCD records of thirteen or more machine words. No special typing conventions are necessary in punching the text cards. The end of the text must be indicated by a card with `1*ST[OCRerr]p'T punched left-justified in columns 1-6, or by an end-of-file on the INPUT tape following the last card of the text. control in Part C. Processing the Text Before starting to process the input text, [OCRerr]CCER first reads the cards from A2. (An explanation of the control cards will be found [OCRerr] of this report.) When the START control card is found, processing