ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Operating Instructions for the SMART Text Processing and Document Retrieval System chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. "-9 the number given must not have a decimal point (2, not 2.0). If the value of a spec[OCRerr]fication is allowed to be a fraction, it must be punched as a FORTRAN floating-point number, i.e. it must have a decimal point (or the letter E). For exan[OCRerr]le, STATWT 1 is incorrect; the specification must read STA[OCRerr]WT 1. or STATWT 1.0. In Table 2, x indicates a floating point number and n indicates an integer. If a specification appears more than once, only the last occurrence is used. Specifications may occur in any order, except that the last specification (and only the last) must be either STOP or X. 3.1. Specifications Affecting Lookup The SMART lookup uses a stem dictionary together with a suffix list for an accurate determination of both semantic and syntacic roles. The dictionary is stored in a semi-tree format, as is the suffix list. Details of the lookup may be found in references [11] and [12]. It should be noted that the lookup is sufficiently accurate so that if, for exazr[OCRerr]le, [OCRerr] and [OCRerr]P are two stems in the dictionary, HOPPING will be found as [OCRerr]P + P + ING while [OCRerr]PING will be found as [OCRerr] - E + ING; also, if EASY and EASE are stems, EASIER is found from EASY while EASING is found frcm EASE. The specifications associated with the lookup are: ENGTXT causes the English text being looked up to be printed; NOTF[OCRerr]D causes the words not found in the dictionary to be printed; F':JNCH causes document vectors to be punched out (with phrases, if searched for) in binary form. This saves time in future runs since the lookup need not be repeated.