Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Thesaurus, Phrase and Hierarchy Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VII-42 3. The initial versions of a thesaurus, and dictionaries without the construction rules are inferior to revisions and versions made using the rules; and in two out of seven comparisons, the performance of the initial thesaurus versions is not as good as the stem process (comparisons 8 and 12 worse than stem; comparisons 1, 3, 6, 9 and 13 superior to stem). 4. The thesaurus superiority is not always preserved when less than optimal document length and matching function parameters are used; thus, in twelve comparisons, three are inferior to stem (comparisons 22, 25, and 26 inferior; comparisons 4, 5, 7, 11, 14, 15, 20, 24, and 27 superior). 5. For users needing high precision with only one or two relevant documents, the thesaurus is little better than stem on IRE-3, but in Cran-l and ADI, a larger superiority for the thesaurus is evident (see Figs. 14, 15, and 16). 6. For users with a very high recall need, IRE-3 produces a good improvement for the thesaurus over stem, but in Cran-l and ADI only a very small gain is seen, using the average rank of the last relevant document as a measure (Figs. 14 and 15). 7. The thesaurus-SAl on ADI, made by the semi-automatic rules, does not provide a good performance. It is in all cases inferior to the ADI regular thesaurus-l, and in four of five comparisons it is also inferior to stem (Comparisons 17, 19, 23 and 26 inferior; comparison 21 superior). Results comparing the thesaurus with the addition of phrases are as follows: 1. Phrase dictionaries give a superior performance compared with thesaurus alone by a very small amount only on IRE-3 and ADI, and on Cran-l the thesaurus alone gives a slightly better result (Figs. 19 and 21).