ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
lv-[OCRerr][OCRerr]3
the corresponding precision. For example, recall may be computed after
retrieving five documents, and again after ten documents, and so on,
in increments of five documents; in each case, the recall presumably
increases, as more relevant documents are retrieved, and the precision
may decrease at the same time if additional irrelevant documents are also
produced. In any case, these several recall-precision points can be
plotted on a curve, and the curves obtained can be averaged for many
search requests. This produces the typical recall-precision graphs used
in the present section.
A) The Null Thesaurus
As previously explained, the null thesaurus is used as part of a
word matching, or word stem matching procedure. This dictionary can,
h[OCRerr][OCRerr]ver, be used in various different ways: for example, it is possible
to apply the dictionary look-up procedure to whole documents, that is, to
all word stems contained in a given document or to only certain document
excerpts such as titles or section headings; furthermore, a given sequence
number from the null thesaurus can be assigned to a document specification
with a uniform weight if, and only if, the corresponding word stem appears
in the given document; alternatively, the sequence numbers can be weighted
in such a way that the weight of a sequence number reflects the frequency
of occurrences in the document of the corresponding word or word stem.
Typical results obtained with the null thesaurus are shown in Figs.
9 and 10, respectively. Fig. 9 exhibits the average output obtained by
using the null thesaurus, first only for word stems occurring in the titles of
the documents, and then for all word stems contained in the complete document