ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Design Criteria for Automatic Information Systems
chapter
M. E. Lesk
G. Salton
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v-26
(null concon), where associated word stems are added to the original stems
available for content identification, and the normal word stem process
previously shown in Figs. 6 and 8. For all three subject areas it is
seen that the word stem associations improve the recall values for the
last few documents retrieved, over and above the values obtainable with
the simple word stem matching process.
As an example of the performance of the concept-concept associations,
consider search request QB2, titled 1ttesting automated information systems",
used with the ADI collection. One of the documents in this collection,
number 80x, dealing with "experiments on documentation techniques" is
th
relevant to the request, but is ranked only 77 out of 82 for the
regular word stem process, because very few of the words used in the
document match the terms of the request. If concept-concept associations
are generated, additional related terms such as "efficient", "real",
"reduce", "experimental", "frequency", etc. are generated; these added
terms provide a bridge between "test" and "experiments", and between
"information" and "documentation", thus accounting for the improved perfor-
mance.
While word-word correlations improve the basic word-stem matching
process for high recall values, Fig. 12 shows that a well-constructed
thesaurus is more powerful than the associative techniques applied to words.
In other words, the thesaurus which serves much the same purpose as the
associative process does so more accurately. This leads to the following
conclusion:
Fule 7 : Statistical concept-concept associations can be used
to improve recall performance particularly for collections
for which a well ordered synonym dictionary does not exist.