ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv- 37 cases where all word stems included in the complete document abstract are matched (full null), and where all word stems are used, but stems included in document titles are weighted twice as heavily as other word stems (null title 2). As can be seen there is not much to choose between these two methods, although the increased title weights seem to perform slightly better for high recall points. It should be noted that both 9f the complete word matching procedures produce very high precision when the recall is low. This reflects the fact that the documents which exhibit the highest similarity with the search requests, and which therefore are retrieved early in a given search operation - assuming that documents are retrieved in decreasing order of similarity with the [OCRerr]earch requests - may be expected to be almost all relevant to the given request. Or, differently expressed, a word matching procedure will be useful if the requestor desires to see only a few documents, and does not insist on obtaining everything that is relevant within a given collection. The more sophisticated thesaurus procedures may then be expected to be useful mainly for the purpose of raising the precision for high recall vajues, that is, to retrieve documents which cannot be inuediately obtained by a word matching process. Fig. 10 shows that the word matching procedure which assigns weights to the stems in proportion to their frequency within a given document (full null) is much more effective than the equivalent matching process in which weights are disregarded (null logvec). The logical vector process is one where each word stem is assigned the same weight, namely 1, and no distinction is made between more and less important stems. To summarize then, the word stem matching procedure performs best when all word stems are used from null document abstracts, or full documents,