IRE
Information Retrieval Experiment
Laboratory tests: automatic systems
chapter
Robert N. Oddy
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
The test collection 161
text of the documents may be available in machine-readable form. More
frequently, it will be the abstracts that are available, or perhaps, only the
titles. Index terms may have been assigned to the documents manually, using
one or more indexing languages. Queries are offen expressed as short natural
language questions or statements, and may be accompanied by terms chosen
manually from an indexing language. The constitution of the raw data will,
of course, result from compromises between the experimenter's objectives
and data available. Each query must have associated with it a list of
documents judged relevant: the judgements are usually recorded as points on
a simple ordinal scale of relevance. Before retrieval tests are conducted, the
collection is typically transformed into a convenient numerical form.
Some test collections are used by a number of researchers working in
different institutions. The collection constructed by Cleverdon in the mid-
1960s at Cranfield14 is one that has seen, perhaps, the heaviest use in
computer-based tests, although it was built for a manually executed
experiment. An experimenter who chooses to make use of an existing test
collection may use the raw data, if he wishes to try a new text processing
technique, or he may be able to take advantage of existing indexing as
embodied in the numerically coded data.
All but a small minority of experiments have assumed a very simple
structure for document descriptions and queries; the same structure serves
for both. A document (or query) is represented by a set of weighted terms, for
example:
cluster (6), file (4), method (4), document (3), single-link (3), hierarchy (3),
algorithm (2), compare (2), time (2), heuristic (1), theory (1),..
The weights are derived from the text and usually indicate the importance,
In some sense, of the term to the subject matter of the document (query). The
terms can be numbered serially (in an arbitrary order), and if the collection
Is static, as it invariably is in laboratory tests, the representative can be
described as a vector whose elements are term weights. This is the symbolism
employed in most of the literature on the SMART experiments8. In many
tests, a special instance of this representational structure is used, namely a set
of unweighted terms: if the weights are all the same, they can be dropped
from the descriptions. The more complex structures which have appeared in
experimental work, such as term classes2 and document clusters3' 8, 24, are
derived from simple representatives like these.
In a laboratory situation, it is very unusual for the automatic system to
operate in the presence of human enquirers. Consequently, it is not necessary
to provide facilities for interpreting and displaying textual data in the search
programs. Retrieval and information structuring programs can be simplified
considerably if they do not have to handle text. Therefore, documents,
queries and terms are given numerical names (serial numbers). To the
programs, a document description is a document number followed by a list of
term numbers, each of which may, in some collections, be accompanied by
a numerical weight. A query has the same structure. The relevance
judgements pertaining to a query are typically denoted by a query number
followed by a list of the serial numbers of documents deemed relevant. From
files of records of this sort, other files can be derived by relatively
straightforward programs, for the later convenience of retrieval programs.