ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval On Some Clustering Techniques for Information Retrieval chapter J. D. Broffitt H. L. Morgan J. V. Soden Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. `x-il~ highezt ranking clusters makes the number of documents retrieved closer to the number retrieved by Bonner1s method. The superiority of the cosine similarity measure is evidenced by entry number one in Table 1. [OCRerr]n this run, the cosine coefficient was used to obtain the similarity matrices for Bonner's procedure. ThiB entry clearly dominates the ne[OCRerr] three entries for any reasonable measure of t1goodness. Further research using Bonner?s method should therefore focus attention upon this similarity measure. If we compare the results for all of the entries using the manual relevance judgments with those using the "automatictt relevance judgments, it is clear that the relative ranking of the entries is not changed. These results are, for all practical purposes, invariant under either set of relevance judgments. The above results should be viewed in their proper light. The document collection is very small and the number of computer runs obtained so far too few to make any strong statements. It is clear that a more thorough analysis of the effects of the parameters of these clustering procedures should be made before any solid conclusions are drawn regarding the relative efficiency of the two[OCRerr]methods tested.