IRE Information Retrieval Experiment Ineffable concepts in information retrieval chapter Nicholas J. Belkin Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 52 Ineffable concepts in information retrieval documents and needs so as to maximize the matching mechanism's ability to predict the topic relationship between text and need. This assumes that informativeness is an appropriate quality for accomplishing the goal, and furthermore that informativeness is dependent upon aboutness or meaning (topic), while not being identical with either. To test relative informativeness in such a context, it is no longer enough simply to accept relevance judgements, for there is not necessarily a strict correlation between relevance judgements and topic relations. It might, indeed, be possible to use independent evaluations of text and/or need topic as the basis for a test design in this context, but in order to do this properly, the idea of aboutness or meaning which underlies the informativeness notion must be used as the basis for these assignments. Note, however, that the topic assignment must be in some terms other than those of the representational scheme(s) being investigated. The following example discusses some of the inference or interpretation problems that arise in this situation in more detail. This example of a chain concerns the notion of aboutness as applied to both text and need; that is, synthema or homeosemy. Here we are concerned with the general case of developing or testing a retrieval mechanism based on the degree of synthema or homeosemy between text and need. In order to do this, one must first begin with some notion of aboutness; say Hutchins'15 idea that it inheres in the thematic structure of the document as a whole. From this basic idea, one then needs to develop an analytical technique for obtaining a representation of aboutness from the document structure. This technique will have its theoretical basis in text-linguistics, and will indicate the significant concepts of the document and their interrelations (say). One could, perhaps, use the resulting structure directly for matching purposes, or reduce it to, say, a set of index terms. Such reduction would again be based upon an assumption that aboutness can be adequately represented by a set of single concepts. So here is an aboutness representation of the[OCRerr]4ocument, which one wishes to match against an aboutness representation of a need. Notice how many assumptions have been made here. More are needed when one comes to the information need representation. Thus the first assumption concerning the need must be that what the need is about is indeed capable of being precisely expressed linguistically. This assumption leads one to a technique for eliciting a stat[OCRerr]ment of need from an information retrieval system user, which can be analysed and represented by the techniques used to analyse and represent the document (or at least by techniques that result in similar structures). These steps assume that documents and questions (linguistic need representations) are basically similar in their aboutness structures. Given this assumption, one then matches the two representations against one another, in order to judge their `likeness'. The question of likeness then introduces the need for a whole new set of assumptions, concerning the scale along which likeness will be determined. One solution in inf6rmation retrieval has been to accept indexing-type representations, and' then to assume that the degree of synthema is related to the overlap of index terms between the two representations (level of co- ordination). Other solutions include spatial or vector analogies, in which the distance between two points in a space, or the angle between two vectors31' 32 is a measure of the likeness of the document and need represented by the two entities in the space. Notice that any of these solutions requires strong