ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Iv-32
under Tabstract trees', which in turn would come under "graph theory", a
branch of algebra. It does not appear reasonable to expect that a
hierarchical arrangement of concepts will serve equally well for all uses
under all circ'mistances. Rather any hierarchy will serve its function, if
it can be counted upon to suggest [OCRerr][OCRerr]ys of broadening or narrowing a given
search request or a given interpretation of the subject matter under most
of the circumstances likely to arise in practice.
Dictionary ?erformance
Tn order to obtain an idea of the relative effectiveness of the
various dictionaries in a retrieval situation, s[OCRerr]e e[OCRerr]erimental results
may be presented, based in each case on averages obtained with 17 search
requests used in conjunction with a document collection of some 500 document
abstracts in the computer literature. The retrieval performance is measured
by two parameters, kno[OCRerr]m respectively as recall and precision. Recall is
defined as the prQ[OCRerr]ortion of relevant material actually retrieved and a
high recall score therefore im[OCRerr]lies that much of what is useful in a
collection has actually been produced during the search operation. Precision,
on the other hand, is the proportion of retrieved material which is actually
relevant, and a high precision score implies that very little useless
material had been obtained as a result of a given search. Clearly both of
these parameters are Lmportant, and a perfect search would therefore
exhibit both a high recall and a high precision.
Recall and precision results can be presented in many different forms.
One of the simplest [OCRerr] in which to exhibit the performance measures is
in the form of recall-precision graphs. Such graphs are obtained by looking
at many recall points for each search request, and computing in each case