IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-6 It may be expected that, commencing with documents short in length, any increase in length will increase the nun[OCRerr]ber of concepts that match between the requests and documents. In the type of test environment used by SMAI[OCRerr]T, namely a siuulated real-life situation using requests and relevance judgments that are inevitably subjective in nature, it is quite rare for any short length documents to completely match with all the request concepts. In cases where a complete match does occur, it is natural[OCRerr] not necessary to increase the document length to improve the request/document match, except that in the numeric vectors scheme, the matching concepts are often increased in the longer documents. The effect of the use of the cosine correlation with numeric vectors is complex, because this matching scheme includes the length of both the request and document, as well as the matching concepts in the algorithm, as follows: Cosine Correlation Coefficient = = The concepts that Match between a Request and a Document, using the sums of products of the weights assigned to the matching concepts; Rw = The total concepts in the Request, using the sums of the squares of the weights assigned to the concepts; Dw = The total concepts in the Document, using the sums of the squares of the weights assigned to the concepts. where MW The resulting coefficient is obtained for each request in relation to every document in the collection, so that the output of the search may be an ordered list of documents. In tests investigating document length, all other van- ables such as the request set, the document collection, the word dictionary