ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv- 3[OCRerr] abstracts. F[OCRerr]g. 10, on the other hand, illustrates the effect of the weighting procedure. In each case, a perfect result would be indicated by hav[OCRerr]ng both a recall and a precision of 1, which in the recall-precision graph implies a curve concentrated in the upper right-hand corner of the grid. The fact that the c[OCRerr][OCRerr]res actually vary between a precision of 0.8 and 0.9 for a recall of 0.1, and a precision of 0.1 to 0.[OCRerr] for a recall of 1 shows that the retrieval results were less than perfect. Fig. 9 indicates first of all that the null thesaurus procedure, when applied to the document titles only, performs much less well than when the thesaurus look-up is extended to complete document abstracts. Indeed the so-called 1tnull title only!! process [OCRerr]rod1).ces a precision inferior by about 20 to 30 percent for a given recall level, compared to the other Ilfull null't and null title 2!! processes. It is interesting to note, in this connection, that the !?null title only!! procedure is effectively equivalent to the use of a so-called [OCRerr]IC index (kev'.[OCRerr]rd-in-context) which is widely advocated and used for retrieval purposes. Permuted document titles are listed in a [OCRerr]7IC index in si[OCRerr]ch a way that a given title appears in the proper alpha- osition ____ b[OCRerr]tic correspondin[OCRerr] to each of the principal woris contained in the title (for example, a title such as 1!Information ?etrieval!! will be listed under I for information and again under i'[OCRerr] for retrieval). It r[OCRerr]y be that a :[OCRerr]ic index is more useful than no index at al1[OCRerr], but it is quite clear - as reflected in the results of ig. 9 - that a process which takes into account only the words from [OCRerr]oc[OCRerr][OCRerr]ent titles is net nearly as effective as an equally simple proce[OCRerr]s which matches word stems from full text. The other two curves included in Fi[OCRerr]. 9 cover the already mentioned