ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-35
25-29, the next in 30-34, then in 35-39, and so on. If there are less
than six concept numbers, the remaining fields should be left blank.
No concept number should exceed 32767. Concept mm[OCRerr]ers above 32000
(and the concept number 0) are reserved for non-significant words. The
concept numbers 1-32000 are significant concepts, used in correlation
process.
U[OCRerr] to eight syntactic codes may also be punched for each stem. They
are punched in columns 55-78 in three-column numeric fields, right adjusted
within each field but using the leftmost columns first. [OCRerr]ch syntactic
code is a number, less than 256. As currently implemented, these zumibers
correspond to partial stem homographs for the Kuno syntactic analyzer.
The correspondence between syntactic code numbers and partial h[OCRerr]nographs
is shown in Table 4. The homograph is completed by combining the partial
stem homograph with the partial suffix homograph of the syntactic suffix
list (5.3.1) as explained in [13].
The word stems on the input cards should be in correct BCD alphabetical
order. They must be correctly ordered by the first letter, as the programs
are currently arranged: the degree of order necessary depends on the size
of the dictionary. The thesaurus must end with a card containing a
multipunch in column 1, and *ENI)* in cols 2-6.
5.1.2. Statistical Phrase Dictionary
The second file on the library tape contains the first of the phrase
dictionaries, the statistical phrase dictionary. The first card of the
statistical phrase deck (following the last card of the thesaurus update
section) is a control card specif[OCRerr]ring the use to be made of the old file
on tape A6. Columns 1-6 of this card should be either RE[OCRerr][OCRerr]AC, indicating