IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-41
that is valid outside the particular test. Other parameters such as request
length and request concept frequency are used in the study in Section X.
C) Collection Comparisons
The data which describe the test environments in Figs. 1, 3, 4, and 5
reveals many points at which the environments differ, such as collection and
request sizes, collection and request average lengths, request generality,
request preparation and relevance decisions, and so on It is recognized that
at present, it is not possible to sufficiently control these variables so
that comparisons between collections can be made under the assumption that
the effects of these variables have been adequately controlled. Suitable
control of these and other so far unrecognized variables would permit com-
parisons between collections of documents in different subject areas. This
might be of interest since `the terminology of different subject areas might be
regarded as lying on a continuum ranging from 1'hard'1 or "firm" subject areas
to "soft" or "mushy" as suggested by Cleverdon (16]. This may be a valid
hypothesis, since in data retrieval situations in some areas of chemistry, the
firm language permits simultaneous high recall with high precision performances,
whereas in other areas such as parts of the social sciences the imprecise
language often produces very much poorer precision recall curves. Alternatively,
it may be the ca&[OCRerr] that subject fields contain sub-areas of soft and firm ter-
minology: in aerodynamics, for example, descriptions of wing shapes and aspect
ratios seem to be fairly unambiguous, whereas treatment of gas and fluid flow
phenomena seems to abound with ambiguities.