<DOC> 
<DOCNO> IRS13 </DOCNO>         
<TITLE> Scientific Report No. IRS-13 Information Storage and Retrieval </TITLE>         
<SUBTITLE> Thesaurus, Phrase and Hierarchy Dictionaries </SUBTITLE>         
<TYPE> chapter </TYPE>         
<PAGE CHAPTER="7" NUMBER="22">                   
<AUTHOR1> E. M. Keen </AUTHOR1>  
<PUBLISHER> Harvard University </PUBLISHER> 
<EDITOR1> Gerard Salton </EDITOR1> 
<COPYRIGHT MTH="December" DAY="" YEAR="1967" BY="National Science Foundation">   
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 
</COPYRIGHT> 
<BODY> 
vII-22


improved by the IRE-3 thesaurus, and hardly significantly improved over

stem on Cran-l and ADI.

       Data on individual request preferences based on this average

rank evaluation is given in Fig. 15.   It is noteworthy that the rank posi-

tion of the first relevant is unchanged by the use of a thesaurus in over

one quarter of the requests.  This is most strong[OCRerr] seen in the IRE-3 re-

sult, which shows that the drop in average rank of the first relevant with

the thesaurus is caused by only very few requests being inferior to stem.

The only small reversal of merit in Fig. 15 is the Cran-l indexing result

using the average rank of the last relevant, where it is seen that on an

individual request basis, stem has a slight edge over thesaurus.

       The use of mean rank position as in Fig. 14, is not very well

suited to some of the data presented.   For example, the median rank

position of the first relevant document is nearly always one, so addi-

tional data on the rank position of the first relevant is given in Fig. 16.

Here it may be seen that the thesaurus dictionaries all produce results

for which two to six more of the requests have their f£rst relevant in

rank positions one or two; in the Cran-l and ADI collections, the number

of requests having the first relevant ranked later than ten is also

reduced by the thesaurus.

       The results in Figs. 6 and 7 which were based on matching

functions other than cosine numeric are not presented in the form of

complete precision recall graphs, but a simplified table giving the merit

at three positions on the precision-recall curves appears in Fig. 17.  In

general, the merit is the same as that seen for the normalized measures:

the cases where stem performs better than the thesaurus are of interest

</BODY>                  
</PAGE>                  
</DOC>