IRE
Information Retrieval Experiment
Ineffable concepts in information retrieval
chapter
Nicholas J. Belkin
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
50 Ineffable concepts in information retrieval
it is difficult to make explicit predictions of behaviour or other empirically
verifiable phenomena on their basis. And, for the same reasons, it is very
difficult to determine reasonable operational definitions for these variables.
In order to achieve these goals, it is usually necessary to go through a number
of subsequent assumptions or hypotheses, each of which is a theoretical
construct in its own right. When one finally gets to some phenomenon that is
operationally definable or empirically observable, the relationship of that
phenomenon to the original theoretical concept is probably very tenuous
indeed. All of the intervening constructs and assumptions mean that it is
unclear just what is being tested in the final experiment or investigation.
Concepts from both the user and text related groups share this problem, and
so, therefore do those from the group of concepts arising from their
relationships.
For example, consider the problem of operationalizing information need.
Belkin and Oddy9 have suggested that an `anomalous state of knowledge'
(ASK) is the basis of any information need, and that information retrieval
systems should attempt to use representations of ASKs as the basis for
retrieval. An ASK is considered by them as a part of an individual's state of
knowledge which that person considers to be inadequate (anomalous) in
some way. The first problem that arises in trying to make this concept
operational is to decide upon a general schema for representation. On the
basis of psychological arguments, the investigators29 chose structures
consisting of concepts and relations among the concepts. Next one needs to
decide upon means for obtaining the data from which the representation will
be constructed. They decided to use `problem statements'; that is, statements
by users about the problem which brought them to an information retrieval
system. This decision was supported by Wersig's7 argument concerning the
problematic situation, but the method for eliciting these statements had to be
designed from first principles. Finally, a technique for analysing the data and
generating the structure is needed. On the basis of some quite speculative
argument concerning underlying `cognitive' structures and their reflection in
linguistic structures, and in order to make the problem relatively simple, the
general structure chosen was one of associative relations among concepts,
these concepts to be represented by word stems and strength of association
determined by the degree of co-occurrence of words within specified distances
in the text of the problem statement. This entire chain then resulted in a
structure which was claimed to be a representation, at some level, of the ASK
underlying the person's information need. The representation could be
displayed as a graph, with word stems as nodes, associative relations between
nodes represented by edges, and the distances between nodes related to the
strength of their association.
Consider now what lies between the original theoretical construct (the
notion of an ASK) and its operational definition. There are assumptions and
decisions made about what a state of knowledge is, or could be; about how,
and even whether, some verbal description of an ASK can be elicited; about
the nature of relations between concepts in a state of knowledge; about the
relationship between the distance between words in a text and association
strength of concepts in a state of knowledge; and many more. These
assumptions build one upon the other in an elaborate inference chain, so that
the end product, the representation, is only tenuously related, and in very
U
I
I
i
I
I
I
I:
I