ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval On Some Clustering Techniques for Information Retrieval chapter J. D. Broffitt H. L. Morgan J. V. Soden Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Ix-lo in the cluster. This set of classification vectors is used in the two-level search procedure, as described at the end of section 3. 5. The Experiment The experimental environment consisted of 82 documents and 20 queries from an American Documentation Institute (ADI) collection. These documents were automatically indexed by the SMART automatic document retrieval system. [5] This system provided 82 document vectors and 20 query vectors in 601-dimensional Euclidean space. These vectors are then normalized to length one and used as input to both Bonner1 5 and Rocchio's clustering procedures. Since each of these procedures depends on several parameters, many runs were planned with each method in order empirically to determine desirable values for these parameters. The results presented below are based mainly on six computer runs, four using Bonner's clustering method and two using Rocchio1s clustering method. Each run consists of a clustering process for the 82 documents, resulting in the generation of the classification vectors, foll[OCRerr]Ted by a two-level search for each of the 20 queries. In addition, a run was made to match each query with the entire document collection using the cosine matching function, so as to obtain an ordering of the documents by this correlation for each of the 20 queries. At the end of the first level of the search, only the two clusters which correlate highest with a query are retained. This is done because of the small size of the collection. If more than two clusters are retained, the total number of c[OCRerr]parisons with a given query vector becomes