Information Retrieval Experiment

IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 260 The Cranfield tests to such questions as determining the time spent on identifying subject content versus that spent on assigning notations or headings. There were many problems to be dealt with, and we came up against many difficulties that had not been envisaged' (p.27), but as the Report notes, `by the time that the first 6000 documents had been dealt with, the indexers had established a tempo which was maintained to the end of the whole of the indexing.' (p.25) Even so, it is clear that indexing to time was a major effort for the indexers. A point of particular interest is that experience with the indexing led to developments of the schemes used. Indeed at one stage `it was clear that with the facet classification we were getting into a complete mess and a technical revision of the schedules was necessary, together with some procedural alterations.' (p.28) Again, `as indexing continued, the original rules for the alphabetical subject indexing appeared to be too restrictive and we had to make some slight modifications.' (p.29) The UDC presented some, though fewer, problems, while Uniterms presented least difficulties. The main feature of the whole indexing operation was the idea of main and subsidiary assignments, i.e. each indexer would process a batch of documents using scheme A, say, as base, and then supply appropriate descriptions for B, C and D; he would then process another batch using B as base, and so on. In fact this approach to avoiding global biasses towards one language had to be simplified, though it was maintained in essentials. As Cleverdon says, `it is difficult to know exactly how to assess our experience in setting up these systems, because few people appear to have attempted to review objectively their own experience.' (p.29) These general statements about the four indexing languages are amplified in detailed discussions of the particular problems encountered with the individual languages. For example, for the UDC these included the treatment of synthesis, the interpretation of ambiguous headings for specific technical concepts, the allocation of concepts where several separate placements offered, and also the provision of an alphabetical index. For the alphabetical index an initial problem was that there was no existing index which could simply be utilized, and the index was built up using rules developed during the project defining the character of main headings and subheadings; the particular problems encountered were those of preferred order for different forms of simple or complex concept modification, of direct versus inverted entry, and of multiple entries and cross references. The facetted scheme, which was based on thoroughly argued principles, was specifically designed for the test. The problems encountered were those of adhering to the preferred citation order, or of maintaining constant word forms for terms in the chain index, and of entry order in the catalogue. With the Uniterm