ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Iv- 3[OCRerr]
abstracts. F[OCRerr]g. 10, on the other hand, illustrates the effect of the
weighting procedure. In each case, a perfect result would be indicated by
hav[OCRerr]ng both a recall and a precision of 1, which in the recall-precision
graph implies a curve concentrated in the upper right-hand corner of the
grid. The fact that the c[OCRerr][OCRerr]res actually vary between a precision of 0.8
and 0.9 for a recall of 0.1, and a precision of 0.1 to 0.[OCRerr] for a recall of
1 shows that the retrieval results were less than perfect.
Fig. 9 indicates first of all that the null thesaurus procedure, when
applied to the document titles only, performs much less well than when the
thesaurus look-up is extended to complete document abstracts. Indeed the
so-called 1tnull title only!! process [OCRerr]rod1).ces a precision inferior by about
20 to 30 percent for a given recall level, compared to the other Ilfull null't
and null title 2!! processes. It is interesting to note, in this connection,
that the !?null title only!! procedure is effectively equivalent to the use
of a so-called [OCRerr]IC index (kev'.[OCRerr]rd-in-context) which is widely advocated
and used for retrieval purposes. Permuted document titles are listed in
a [OCRerr]7IC index in si[OCRerr]ch a way that a given title appears in the proper alpha-
osition ____
b[OCRerr]tic correspondin[OCRerr] to each of the principal woris contained in
the title (for example, a title such as 1!Information ?etrieval!! will be
listed under I for information and again under i'[OCRerr] for retrieval). It r[OCRerr]y be
that a :[OCRerr]ic index is more useful than no index at al1[OCRerr], but it is quite
clear - as reflected in the results of ig. 9 - that a process which
takes into account only the words from [OCRerr]oc[OCRerr][OCRerr]ent titles is net nearly as
effective as an equally simple proce[OCRerr]s which matches word stems from full
text.
The other two curves included in Fi[OCRerr]. 9 cover the already mentioned