IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. iv-4 the two vectors have in common. The summation equals 3, meaning there are three concepts found both in V and w , namely concepts 1 , 1+ , and 6 However, the same expression used with weighted vectors does not produce the same simple interpretation. For example, given the two vectors v = (l2,2[OCRerr],o,36,o,l2,o) w = (2[OCRerr],O,l2,24,O,l2,36) the above equation (EQ-i) gives a value of 1296. Although each of these vectors contains the same concepts as the binary vectors above, and each the name three concepts in conm[OCRerr]n, there is no simple interpretation for the number 1296. The closest interpretation is that it produces a relative value which can be compared with another figure derived by using the sum- mation on v and some other vector w as a measure of the matching con- cepts, thereby it determines which vector, w or Wt [OCRerr] matches better with v. have An example of an expression which doesn't lose its meaning when weighted vectors are used instead of binary vectors is the following t (1¾2 )i i=l (P) This expression represents the absolute length of the vector in t-space, where t is the number of concepts possible in the description vector. There exist coefficients other than these two to measure the simi- larity between documents. For the most part, these coefficients are used in thesaurus construction and measure the similarity between concepts. When calculating the term - term association coefficient, several of the expressions discussed above have a different interpretation. F"r example¶ given the term description v[OCRerr]ctors c1,[OCRerr],... where for each term vector