ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
On Some Clustering Techniques for Information Retrieval
chapter
J. D. Broffitt
H. L. Morgan
J. V. Soden
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
`x-il~
highezt ranking clusters makes the number of documents retrieved closer
to the number retrieved by Bonner1s method.
The superiority of the cosine similarity measure is evidenced by
entry number one in Table 1. [OCRerr]n this run, the cosine coefficient was
used to obtain the similarity matrices for Bonner's procedure. ThiB entry
clearly dominates the ne[OCRerr] three entries for any reasonable measure of
t1goodness. Further research using Bonner?s method should therefore
focus attention upon this similarity measure.
If we compare the results for all of the entries using the manual
relevance judgments with those using the "automatictt relevance judgments,
it is clear that the relative ranking of the entries is not changed. These
results are, for all practical purposes, invariant under either set of
relevance judgments.
The above results should be viewed in their proper light. The document
collection is very small and the number of computer runs obtained so far
too few to make any strong statements. It is clear that a more thorough
analysis of the effects of the parameters of these clustering procedures
should be made before any solid conclusions are drawn regarding the relative
efficiency of the two[OCRerr]methods tested.