IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-7 and the matching algorithms are held unaltered while, say, abstracts and titles are compared. Considering the cosine correlation coefficient for just one document in relation to one request, it is clear that a change from titles to abstracts will not affect Rw in the equation. Factor Dw will increase directly with an increase in document length however. Factor Mw will either increase or remain constant, depending on whether the use of the abstract compared with title only achieves a match with more of the request concepts, and/or increases the weights of the concepts that already match on titles. The resulting difference in [OCRerr]orrelation coefficient between the title and abstract input cannot be predicted: if the abstract provides more matching concepts (Mw), and does not increase document length (Dw) too drastically, the abstract result will give a higher correlation coef- ficient than the title. If the abstract provides no additional matching concepts or increased weights, then the correlation with abstracts will be less than that on titles. An example of what happens in one particular case is given in Figure 2. Details of the request and relevant document are given, as well as portions of the document as looked-up in a thesaurus dictionary using first the title only, then the whole abstract then the full text. Docu- ment length sharply increases to 109 concepts with full text over 12 in the abstract and five in the title. The match between the request and document starts at two out of the six possible concepts with titles; the use of abstracts increases the weight of these two matching concepts, and full text increases the matching concepts to all six, as well as improving weights. However, the cosine correlation coefficients show that in this example the increases in document length exert more influence in the coefficient than the increases in matching concepts, so that the correlation coefficient drops from 0.3651 to 0.3608 with abstracts, and further still to 0.2034 with text.