ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. lv-[OCRerr][OCRerr]3 the corresponding precision. For example, recall may be computed after retrieving five documents, and again after ten documents, and so on, in increments of five documents; in each case, the recall presumably increases, as more relevant documents are retrieved, and the precision may decrease at the same time if additional irrelevant documents are also produced. In any case, these several recall-precision points can be plotted on a curve, and the curves obtained can be averaged for many search requests. This produces the typical recall-precision graphs used in the present section. A) The Null Thesaurus As previously explained, the null thesaurus is used as part of a word matching, or word stem matching procedure. This dictionary can, h[OCRerr][OCRerr]ver, be used in various different ways: for example, it is possible to apply the dictionary look-up procedure to whole documents, that is, to all word stems contained in a given document or to only certain document excerpts such as titles or section headings; furthermore, a given sequence number from the null thesaurus can be assigned to a document specification with a uniform weight if, and only if, the corresponding word stem appears in the given document; alternatively, the sequence numbers can be weighted in such a way that the weight of a sequence number reflects the frequency of occurrences in the document of the corresponding word or word stem. Typical results obtained with the null thesaurus are shown in Figs. 9 and 10, respectively. Fig. 9 exhibits the average output obtained by using the null thesaurus, first only for word stems occurring in the titles of the documents, and then for all word stems contained in the complete document