IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval An Experiment in Automatic Thesaurus Construction chapter R. T. Dattola D. M. Murray Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VIII-4 documents in which both occur. Then the following correlation functions may be used to compute S: 1) cosine: [OCRerr] = N 2) Tanimoto: S.. N L.+L.-N 1 3) overlap: [OCRerr] = N Min(L.,L.) 1 By applying a cut-off value k to the elements of S, the similarity matrix may be interpreted as a binary connection matrix for a graph G. The concepts form the node vector, and node i (concept i) is connected to node j (concept j) if and only if S.. > k. The initial concept classes of the subcollection are specified by the maximal complete subgraphs of G. E6] Given the node vector, the degree vector of these nodes, and the corresponding connection matrix, the algor- ithm for finding the maxima complete subgraphs of G is as follows: 1) pick the first node i which does not occur in any previous initial' class and use it as the start of a new initial class; 2) for each node j connected to node i, a) test if the degree of node j is greater than the number of nodes in the new initial class; and b) test if node j is connected to all other nodes in the new class;