ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Operating Instructions for the SMART Text Processing and Document Retrieval System
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
ii-i4
concepts which co-occur with similar concepts. For example, if one is
see[OCRerr]ng to identify that 1'aeroplane'1 and "aircraft't represent the same
concept, one might deduce this not from the fact that they co-occur in
the same document, but that they each separately co-occur with words
such as "fusilage", "propeller", "aileron11, and the like. The SMART system
m[OCRerr]kes provision for iterated concept-concept correlation to any depth.
The specifications affecting this procedure are as follows:
CONOON n the nun[OCRerr]ber "n" is the nun[OCRerr]ber of iterations of the
concept-concept correlation process. "CONCON 1"
specifies simple first-order correlation;
[OCRerr]DECC a a is either "COS" or "OVLAP" to specify cosine or over-
lap correlation for the first concept-concept correlation;
[OCRerr]D[OCRerr]C a a is either COS or OVLAP to specify the correlation mode
for concept-concept correlations after the first corre-
lation, i.e., for the second and following iterations;
CUTCC x x is a number between 0 and 1 (0.6 is typical) to specify
the cutoff for the first concept-concept correlation;
CUT2C x x is a nuzi[OCRerr]ber between 0 and 1 specifying the cutoff for
correlations after the first;
CO[OCRerr][OCRerr]IN n n specifies the lowest concept number which will be
correlated. This specification is to be used primarily
with null dictionaries prepared by THES (see part 6.1)
in which the concepts are arranged by frequency, thus m[OCRerr]aing
it possible to specify the lowest frequency word to be
correlated;
CONMAX n n specifies the highest concept number which will be corre-
lated. CO[OCRerr][OCRerr]IN and CONMAX are used because the statistical
procedures are not accurate for words which occur either
very rarely or very frequently;