<DOC> <DOCNO> IRS13 </DOCNO> <TITLE> Scientific Report No. IRS-13 Information Storage and Retrieval </TITLE> <SUBTITLE> Thesaurus, Phrase and Hierarchy Dictionaries </SUBTITLE> <TYPE> chapter </TYPE> <PAGE CHAPTER="7" NUMBER="22"> <AUTHOR1> E. M. Keen </AUTHOR1> <PUBLISHER> Harvard University </PUBLISHER> <EDITOR1> Gerard Salton </EDITOR1> <COPYRIGHT MTH="December" DAY="" YEAR="1967" BY="National Science Foundation"> Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. </COPYRIGHT> <BODY> vII-22 improved by the IRE-3 thesaurus, and hardly significantly improved over stem on Cran-l and ADI. Data on individual request preferences based on this average rank evaluation is given in Fig. 15. It is noteworthy that the rank posi- tion of the first relevant is unchanged by the use of a thesaurus in over one quarter of the requests. This is most strong[OCRerr] seen in the IRE-3 re- sult, which shows that the drop in average rank of the first relevant with the thesaurus is caused by only very few requests being inferior to stem. The only small reversal of merit in Fig. 15 is the Cran-l indexing result using the average rank of the last relevant, where it is seen that on an individual request basis, stem has a slight edge over thesaurus. The use of mean rank position as in Fig. 14, is not very well suited to some of the data presented. For example, the median rank position of the first relevant document is nearly always one, so addi- tional data on the rank position of the first relevant is given in Fig. 16. Here it may be seen that the thesaurus dictionaries all produce results for which two to six more of the requests have their f£rst relevant in rank positions one or two; in the Cran-l and ADI collections, the number of requests having the first relevant ranked later than ten is also reduced by the thesaurus. The results in Figs. 6 and 7 which were based on matching functions other than cosine numeric are not presented in the form of complete precision recall graphs, but a simplified table giving the merit at three positions on the precision-recall curves appears in Fig. 17. In general, the merit is the same as that seen for the normalized measures: the cases where stem performs better than the thesaurus are of interest </BODY> </PAGE> </DOC>