ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval S0CCER - A Concordance Program chapter Guy E. Hochgesang Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-3 of the text begins from the IN?UT tape. As the cards of the text are read in, they are numbered sequentially and then written out on the [OCRerr]UTPUT tape to provide a listing of the text. Cards which have a 1!*I! (the 1space control character?1) in column one are listed double-spaced; i.e., a blank line precedes their listing. In addition, cards which have either a 11*1? or a ??$1? (the ?lskip control character1T) are n[OCRerr]t included in the concordance. In effect this causes cards with an asterisk or dollar sign in column one to be interpreted as comment cards. Any card which does not have the skip control character or space cor:trol character in column one is included in the concordance. These cards are scanned for the tokens of the concordance in the following steps: 1. The right-most non-blank character is found. If this character [OCRerr]s not a hyphen (a minus sign) step 2 is taken. If this character is a hyphen, the character and all blanks to the right of it are deleted. The next card is scanned from left to right for alphabetic characters, with the scan terminating at the first special character. These alphabetic characters (if any) are then appended to the card with the hyphen and step[OCRerr]2 is taken. This procedure allows one to hyphenate words from one card to another, provided that the hyphen follows immediately after the last alphabetic character on the first card and that the syllable on the second card starts in column one. Such hyphenated words appear in the concordance with the syllables properly joined together and the hyphen deleted. 2. If n consecutive blanks appear on the card, n-l of these blanks are deleted to allow as much significant context as po[OCRerr]sible to be included with the tokens in step 3. 3. The card is then scanned for tokens. As each token is found it is written out on an intermediate tape, SMRTAP, along with