ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
II-3~
taming ZZZZZZ in columns 1-6. The suffix list may appear in any order,
provided that the first suffix begins with `e".
If the thesaurus is introduced from cards (control word BOTH or THE[OCRerr]),
it follows the suffix list (if any) immediately. If there is no suffix
list, it follows the first data card. The thesaurus may contain any number
of words. Each word is punched on a separate card. Since a suffixing
routine is available, only the word stems should be entered in the dictio-
nary. The suffixing routine will accept imiltiple suffixes, and also
recognizes that words may double their final consonant, change a final `y"
to an `1i", or drop a final 11e11 before adding a suffix [7]. This permits a
highlY accurate lookup in which almost all derivatives of most English
words can be identified automatically [12]. of course, the extremely
variant forms (such as ttmice", "geese1, etc.) must be entered in the dictio-
nary as separate words. It is sometimes necessary to enter complete forms
of words to preserve important distinctions. For example, `1programmer"
is readily identified from "program" + t1m11 + "er", but if it is desired
to distinguish a progrannmer from a program, both forms should be entered.
The dictionary lookup procedure may always be used as a full paradigm
dictionary by entering every form of a word in the thesaurus, at the
expense of a great deal of keypunching. Words entered explicitly take
precedence in the lookup over possible stem-and-suffix combinations.
Each stem to be entered in the dictionary is punched left-adjusted
in columns 1-24 of a card, with trailing blanks.
With each word are associated between one and six concept numbers.
These are punched right-adjusted in five column fields, starting in
column 25 and extending to 54. The first concept nuzriber is punched in