ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
II-5~
frequently used for auxiliary functions. These are described in the present
section.
6.1. THES
TEES is a program to fo[OCRerr]n null dictionaries from a collection of
English text. It also prints out frequency counts and listings as a
by-product. It includes a suffixing routine.
TEES requires an A2 one control card. This card contains three
integers. The first integer, punched right-adjusted in columns 1-5,
specifies the maxizirurn number of concepts (words) to be included in the null
dictionary. The second integer, punched right-adjusted in columns 6-10,
specifies the minimum number of occurrences in the collection that any word
in the dictionary may be expected to exhibit. These two numbers permit the
user to control the size of the null dictionary. For a complete null dictio-
nary, the first number should be very large and the second number shculd be
1. The third number is punched right-adjusted in columns 11-13 and specifies
the tape on which the document collection is located. If this field is
blank, tape 5 (the input tape) is assumed.
The collection is [OCRerr]laced on the specified tape in normal SM[OCRerr]T format
([OCRerr].l), with documents preceded by *TEXT cards only. *FIND cards, and *LIKE
cards should not be used. Of course, since no searches are made during
thesaurus construction, the requests may be labeled *TEXT, without problems,
if it is desired to include them in the counts. A *STOP card ends the
collection.
6.2. [OCRerr]RVAL
[OCRerr][OCRerr]VAL is a program to compute additional evaluation data for a set of