ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
"-9
the number given must not have a decimal point (2, not 2.0). If the value
of a spec[OCRerr]fication is allowed to be a fraction, it must be punched as a
FORTRAN floating-point number, i.e. it must have a decimal point (or the
letter E). For exan[OCRerr]le, STATWT 1 is incorrect; the specification must
read STA[OCRerr]WT 1. or STATWT 1.0. In Table 2, x indicates a floating point
number and n indicates an integer.
If a specification appears more than once, only the last occurrence
is used. Specifications may occur in any order, except that the last
specification (and only the last) must be either STOP or X.
3.1. Specifications Affecting Lookup
The SMART lookup uses a stem dictionary together with a suffix list
for an accurate determination of both semantic and syntacic roles. The
dictionary is stored in a semi-tree format, as is the suffix list. Details
of the lookup may be found in references [11] and [12].
It should be noted that the lookup is sufficiently accurate so that
if, for exazr[OCRerr]le, [OCRerr] and [OCRerr]P are two stems in the dictionary, HOPPING
will be found as [OCRerr]P + P + ING while [OCRerr]PING will be found as [OCRerr] - E +
ING; also, if EASY and EASE are stems, EASIER is found from EASY while
EASING is found frcm EASE.
The specifications associated with the lookup are:
ENGTXT causes the English text being looked up to be printed;
NOTF[OCRerr]D causes the words not found in the dictionary to be printed;
F':JNCH causes document vectors to be punched out (with phrases, if
searched for) in binary form. This saves time in future
runs since the lookup need not be repeated.