IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-18
class definition. With the suffix `5' dictionary used as a starting point,
the stems, thesaurus, hierarchyi or concon (statistical association) all
constitute the recall devices, because in each case the suffix `5' content
identifiers are replaced by concepts representing a whole grouping of
words according to the principle used by the dictionary concerned. The
use of phrases, syntax and the weighted concept identifiers (numeric vectors)
are all examples of the use of precision devices, as well as the major
device of coordination which is used in every SMART search, since all
the matching functions make use in some way of the number of request terms
that match with those in the documents.
Although the recall and precision devices are clearly used in
the construction of the dictionaries as described, the use of the dic-
tionaries in SMART does not always produce the expected effect. This
is because the processing techniques possible with automatic systems
can modify or even change the effect of these devices, and it is p05
sible to use a dictionary which has been constructed on the principle
of a recall device, in such a way that the result in the search becomes
an increase in precision.
An example of this is provided by the work on st[OCRerr]atistical asso-
ciation at the Cambridge Language ReGearch Unit (England), where in one
test of their clumping procedure the clumps were seen to be acting
purely as precision devices and not as recall devices at all (9,10].
This occurred simply because the clumps were used as a weighting device
to reinforce certain of the concept matches that already existed with-
out the clumps. Since in SMART concon, hierarchy and phrases are
normally used to add concept numbers to the documents and requests,