IRE
Information Retrieval Experiment
Ineffable concepts in information retrieval
chapter
Nicholas J. Belkin
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
52 Ineffable concepts in information retrieval
documents and needs so as to maximize the matching mechanism's ability to
predict the topic relationship between text and need. This assumes that
informativeness is an appropriate quality for accomplishing the goal, and
furthermore that informativeness is dependent upon aboutness or meaning
(topic), while not being identical with either. To test relative informativeness
in such a context, it is no longer enough simply to accept relevance
judgements, for there is not necessarily a strict correlation between relevance
judgements and topic relations. It might, indeed, be possible to use
independent evaluations of text and/or need topic as the basis for a test
design in this context, but in order to do this properly, the idea of aboutness
or meaning which underlies the informativeness notion must be used as the
basis for these assignments. Note, however, that the topic assignment must
be in some terms other than those of the representational scheme(s) being
investigated. The following example discusses some of the inference or
interpretation problems that arise in this situation in more detail.
This example of a chain concerns the notion of aboutness as applied to
both text and need; that is, synthema or homeosemy. Here we are concerned
with the general case of developing or testing a retrieval mechanism based on
the degree of synthema or homeosemy between text and need. In order to do
this, one must first begin with some notion of aboutness; say Hutchins'15 idea
that it inheres in the thematic structure of the document as a whole. From
this basic idea, one then needs to develop an analytical technique for
obtaining a representation of aboutness from the document structure. This
technique will have its theoretical basis in text-linguistics, and will indicate
the significant concepts of the document and their interrelations (say). One
could, perhaps, use the resulting structure directly for matching purposes, or
reduce it to, say, a set of index terms. Such reduction would again be based
upon an assumption that aboutness can be adequately represented by a set of
single concepts. So here is an aboutness representation of the[OCRerr]4ocument,
which one wishes to match against an aboutness representation of a need.
Notice how many assumptions have been made here. More are needed
when one comes to the information need representation. Thus the first
assumption concerning the need must be that what the need is about is
indeed capable of being precisely expressed linguistically. This assumption
leads one to a technique for eliciting a stat[OCRerr]ment of need from an information
retrieval system user, which can be analysed and represented by the
techniques used to analyse and represent the document (or at least by
techniques that result in similar structures). These steps assume that
documents and questions (linguistic need representations) are basically
similar in their aboutness structures. Given this assumption, one then
matches the two representations against one another, in order to judge their
`likeness'.
The question of likeness then introduces the need for a whole new set of
assumptions, concerning the scale along which likeness will be determined.
One solution in inf6rmation retrieval has been to accept indexing-type
representations, and' then to assume that the degree of synthema is related to
the overlap of index terms between the two representations (level of co-
ordination). Other solutions include spatial or vector analogies, in which the
distance between two points in a space, or the angle between two vectors31' 32
is a measure of the likeness of the document and need represented by the two
entities in the space. Notice that any of these solutions requires strong