IRE
Information Retrieval Experiment
Gedanken experimentation: An alternative to traditional system testing?
chapter
William S. Cooper
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
206 Gedanken experimentation: An alternative to traditional system testing?
Systems of a list of descriptors accompanied by numeric weights and
separated by commas as for example
IRON 50, MANUFACTURING 20, POLLUTION 5
The positive number following each term is to be interpreted as that term's
`probability change factor' or `precision-boosting power'[OCRerr]that is, as the
multiplicative factor by which a document's probability of satisfaction is
changed by the presence of the term on the document. Suppose for example
that in a random draw from the entire collection the probability of obtaining
a useful document is 0.001. Then the presence of the weight 50 on the term
IRON indicates that the requestor, if he were to learn that the randomly
drawn document had been indexed under IRON, would raise his personal
estimate of that probability from 0.001 to 0.05. In other words, in a gedanken
experiment in which he compares the density of useful documents in the
whole collection to their density in the set indexed by IRON, he guesses the
latter density to be some fifty times the former. The weights on the other
request terms are interpreted independently in similar fashion.
The probabilistic computations needed to estimate a document's proba-
bility of satisfaction on the basis of such a request are involved and will not
be presented here, though we hope to discuss them in a later publication.
They require for their input not only the request weights but also some
indexing statistics, specifically such data as the number of documents in the
collection indexed under IRON, and the number indexed jointly under
IRON and MANUFACTURING. An estimate of the total number of useful
documents in the collection is needed too, but the final output ordering is not
very sensitive to the value supplied for this estimate. An independence
assumption of some sort is needed to circumvent the need for data on such
higher order interactions as the degree of overlap in the collection among
three or more terms. Paul Huizinga of the University of California has
proposed that an independence assumption derived from the maximum
entropy principle may be appropriate for this purpose11.
Example 4: The system designer as gedanken experimenter
For systems where the users formulate their own requests without the aid of
an information professional as intermediary, it might be unrealistic to hope
for meaningful numeric values of the kind required for the query language of
the previous example. A more workable system might merely require that the
user attach to his request terms not numeric weights but non-numeric
symbols indicative of his qualitative judgements of relative likelihood. Here
for example is a six level scale of such judgements:
Symbol
A
B
C
Interpretation
Presence of clue would, other things being equal, make it
vastly more likely that the document is useful. A clue of the
strongest sort.
Presence of clue would make document a much more likely
candidate. Clue is a typical `good' clue.
Presence of clue would make it somewhat more likely that
the document would prove useful.