ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval On Some Clustering Techniques for Information Retrieval chapter J. D. Broffitt H. L. Morgan J. V. Soden Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Ix-12 7. Results and Conclusions The results of the study are sunKnarized in Table 1. One striking characteristic of Bonner's method is the large nuniber of clusters it produces. This is to be expected since Bonner refuses to associate dissimilar documents, whereas Rocchio allows dissimilar documents to be associated in order to build clusters of a size conducive to search efficiency. Thus, one would expect that the mean number of matches made with a query vector using Rocchiots method would be less than the mean number of matches made using Bonner's method. The results support this hypothesis since [OCRerr]pproximately twice as many matches are required using Bonner's method as when usi[OCRerr]ng Rocchiots method. Next, restricting our attention to the recall and precision results obtained using the manual relevance judgments, it is apparent that Bonnert S metho4 exhibits higher precision than, and nearly equivalent recall to Rocchiots method. One would expect the higher precision since there are far fewer members in each cluster, and hence, when a cluster is retrieved, it is more likely to contain a high percentage of relevant documents. Also, there is a higher similarity between members of the same cluster with Bonnerts method than with Rocchiots method. Hence, if the cluster is similar, more of the documents in it are likely to be relevant since they are all very similar. The nearly equivalent recall between the two methods is somewhat surprising, as one would expect the large number of documents retrieved by using Rocchio's method to include more of the relevant ones. This is the case if the two highest ranking clusters are used, but is not true if only the highest ranking cluster is used, although using only the