IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
260 The Cranfield tests
to such questions as determining the time spent on identifying subject content
versus that spent on assigning notations or headings. There were many
problems to be dealt with, and
we came up against many difficulties that had not been envisaged' (p.27),
but as the Report notes,
`by the time that the first 6000 documents had been dealt with, the indexers
had established a tempo which was maintained to the end of the whole of
the indexing.' (p.25)
Even so, it is clear that indexing to time was a major effort for the indexers.
A point of particular interest is that experience with the indexing led to
developments of the schemes used. Indeed at one stage
`it was clear that with the facet classification we were getting into a
complete mess and a technical revision of the schedules was necessary,
together with some procedural alterations.' (p.28)
Again,
`as indexing continued, the original rules for the alphabetical subject
indexing appeared to be too restrictive and we had to make some slight
modifications.' (p.29)
The UDC presented some, though fewer, problems, while Uniterms
presented least difficulties. The main feature of the whole indexing operation
was the idea of main and subsidiary assignments, i.e. each indexer would
process a batch of documents using scheme A, say, as base, and then supply
appropriate descriptions for B, C and D; he would then process another
batch using B as base, and so on. In fact this approach to avoiding global
biasses towards one language had to be simplified, though it was maintained
in essentials. As Cleverdon says,
`it is difficult to know exactly how to assess our experience in setting up
these systems, because few people appear to have attempted to review
objectively their own experience.' (p.29)
These general statements about the four indexing languages are amplified
in detailed discussions of the particular problems encountered with the
individual languages. For example, for the UDC these included the treatment
of synthesis, the interpretation of ambiguous headings for specific technical
concepts, the allocation of concepts where several separate placements
offered, and also the provision of an alphabetical index. For the alphabetical
index an initial problem was that there was no existing index which could
simply be utilized, and the index was built up using rules developed during
the project defining the character of main headings and subheadings; the
particular problems encountered were those of preferred order for different
forms of simple or complex concept modification, of direct versus inverted
entry, and of multiple entries and cross references. The facetted scheme,
which was based on thoroughly argued principles, was specifically designed
for the test. The problems encountered were those of adhering to the
preferred citation order, or of maintaining constant word forms for terms in
the chain index, and of entry order in the catalogue. With the Uniterm