IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval An Analysis of the Documentation Requests chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. X-2~ actually performed poorly (two "A'1 and one "B"). The performance results in Figure i4 compare the "A11 and `1B'1 requests for two dictionary runs each made on two different document lengths, text and abstracts. The expected superiority of the 11A11 requests is seen in the stem dictionary results, but with the thesaurus, the "B'1 requests perform slightly bettere Since the "B" requests are quite inferior to "A" with the stem dictionary, this does leave a greater opportunity for improvement with the thesaurus; the initial inferiority on stem requires an explanation, however, as well as the fact that the thesaurus does not much improve the "A" requests. It is difficult to isolate any fundamental reasons for this result, because individual problems with both the stem and thesaurus dic- tionaries seem primarily to be the cause, as the foliowing example shows. Request BlO, with a normalized recall of 0.8205 with thesaurus and 0.3718 with stem, has four documents assessed as relevant, and the thesaurus produces improvements in rank positions of 22, 26, 32, and 6o compared with stem. Reasons for the superiority of thesaurus in this case are: a) The thesaurus provides additional matching concepts between the request and all four relevant documents, including the impor- tant concept "computer". The stem dictionary fails to match this concept, because the suffix routine used does not conflate all word forms, and "computation" is separated from "compute" and 11computer11. b) The thesaurus does not contain "system" but regards it as a coimnon word to be ignored, and although the stem dictionary uses it and establishes matches with all four relevant documents, this high frequency word also establishes matches with many non-relevant documents. c) The very important request concept "chemistry" is grouped with synonyms in the thesaurus which successfully increase the weight