ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. iv-iS incli[OCRerr]ed in any of the dictionaries. ixperiments were conducted [OCRerr]qith the S[OCRerr][OCRerr]4[OCRerr]T system, using both unrestricted vocabularies (fLJl null thesaurus), as well as frequency restricted entries (partial null). A sample set of document abstracts of some 50,000 total running words, would typically produce a full null thesaurus of about 2,800 distinct word stems, and a partial null dictionary of about 900 stems (assuming a frequency of at least four occurrences for each entry listed). If it is desired to list word stems, rather than full words, these must of course first be generated by a suffix cut-off system. To this effect, a suffix dictionary is built, a typical example of which is shown in Fig. [OCRerr]. The lookup procedure in this suffix dictionary is described in the next chaptertogether with the lookup procedures for the other dictionaries. The structure of the suffix dictionary may, however, be examined iir'[OCRerr]ediately. It may be seen from Fig. 1* that each suffix is listed with a sequence number and with one or more syntactic codes. The latter rp[OCRerr]r be used if it later becomes necessary to recombine stems and suffixes into complete, acceptable words, as may be required, for example, to carry out a syntactic analysis. The syntactic codes included in the suffix dictionary represent only partial homographs which must be combined with complementing codes attached to the word stems in order to determine which suffixes match which stems. (The syntactic codes attached to the word stems included in the null thesaurus are not shown in the output of Fig. 3.) For example, a partial homograph such as OTIO from the null dictionary will combine with a partial homograph code from the siiffix list, such as VOOSO, to form a complete homograph. In this case the complete code is VTISO, indicating a single object transitive verb in the third person singular.