ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Appendix A: The Smart System appendix Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. A-i AP?E[OCRerr]IX A SMAI[OCRerr]T SYS[OCRerr] The SMA[OCRerr][OCRerr]T automatic document retrieval system currently running on the IBM 7094 digital computer at Rarvard University was used both as a simulation enviro[OCRerr]iment and data base generator for the experimental results presented in this thesis. As the SMART system has been thoroughly documented (references i-[OCRerr]), only a brief summary of its main features is outlined here. A. Content Analysis Techniques The indexing function of the SMART system is capable of incorporating a number of automatic content analysis techniques. Docu- ments are entered into the system in the natural language (with a minimal number of keypunching conventions) and passed through a dictionary lockup phase. The lookup operates with a stem-suffix splitting algorithm (which incorporates spelling rules), and word. stems are matched against entries of a stored dictionary. A variety of dictionaries may be used in the system ranging from a simple one to one encoding (keyword dictionary) to a dictionary which produces a many to many thesaurus-type mapping. In addition to providing a semantic encoding for the detected stems, the lookup process has provisions for providing syntactic stem codes based on both the stem and suffix dictionaries. After the [OCRerr]nit[OCRerr][OCRerr]I lockup phase, a coded