CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Comments
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 116 -
Finally, if any of the above permutations are unsuccessful, the question could be
rephrased to read 'Has anyone investigated the conditions at the wall behind a
plane reflected shock front in a real gas by theoretical analysis'.
The semantic difficulties of papers Jn aerodynamics provided a very stiff
test of the recall devices, and as such it could be considered a suitable subject
area for the test. However, the lack of syntactic difficulties caused a change of
plan, as considered on pages 56 and 57, in that it was not a practical proposition
to use roles. It is an interesting point to consider as to whether another inverse
relationship exists, this time between the semantic and syntactic problems involved
in the indexing of any particular subject field. Alternatively, and possibly more
likely, the position may be that with a mushy subject language, the over-riding
necessity of obtaining a reasonable recall ratio inhibits the use of precision devices;
in other words, the semantic problems are so difficult to solve that they completely
overshadow the syntactic problems. However, in a firm subject language area
the semantic problems are more easily solved, so the syntactic problems loom
larger, and one can afford to use precision devices, such as roles. If either of
these situations exists, it will obviously have consequences in the endeavours to
obtain a common sample of documents that can be used to illustrate and evaluate
different types of systems, such as the work at Chicago. Here the intention is to
have 'an open-ended collection of exemplars of indexing systems applied to a common
sample of documents' (ref. 36). The indications are that any given sample of
documents would favour certain types of index languages, but handicap other index
languages, this being dependent on their strong and weak points in relation to devices
intended to overcome the semantic or syntactic problems.
It would seem, that next to the question of relevance assessments, the deter-
mination of the effect of subject language precision is the most important problem
to be tackled. This is certainly true of experimental situations where it is necessary
to compare the performance of tests based on different document collections. For
instance, in the comparison of the results obtained by the SMART tests and in Craafield
H, it is now possible (by the methods discussed in a later volume of this report} to
normalize the different measures used and the effect of generality ratio. Since it is
also theoretically possible to match similar types of index languages and the method
of relevance assessment, any remaining difference in performance figures must be
due to the firmness level of the language of the two subject areas, namely computers
and aerodynamics.
In addition to experimental situations, knowledge of this factor is also important
for the design of an operational system which covers a broad subject field, and
where there is thereby a wide range in the firmness level. An investigation of this
problem could be attempted by a linguistic analysis of the variation of terms in
different subject fields - how many different terms or phrases can be used to express
the same notion and conversely how many meanings a single term has. The
experimental method of investigating the problem to be used at Cranfield will be a
procedure that reverses the present project. Instead of testing a large number of
index languages against a single document set, it will be necessary to find the different
performances achieved when a large number of document sets in different subject
fields are tested against a single index language.
No particular fault is at present apparent in regard to the indexing which proceeded
according to schedule and was completed during the first year of the project. In the