Information Retrieval Experiment

IRE Information Retrieval Experiment The pragmatics of information retrieval experimentation chapter Jean M. Tague Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. i 66 The pragmatics of information retrieval experimentation to assign numbers. He points out that there is nothing sacrosanct about the well-known scales, and they can be suitably modified by the investigator. The interaction between client or user and search analyst or reference librarian known as query negotiation is being increasingly studied. Zipperer7 has analysed this interaction in terms of nine activity categories; Question negotiation (presentation of query) Profile development (vocabulary selection) Tutorial activities (explanation) Search type selection (current awareness or retrospective) Strategy formulation (search statement specification) System description Database selection Administrative procedures Diversionary activities (interruptions) Although these activities relate more to batch than online retrieval and thus might be modified for an interactive environment, it is important that this kind of analysis be standardized, so that results from different studies may be compared. Evaluation Historically, the `evidence' of information retrieval experiments has been in the form of retrieval effectiveness measures, and more specifically recall and precision. Cleverdon1 pointed out the reason for this continuing popularity of these two measures: `The unarguable fact, however, is that they are fundamental requirements of the users, and it is quite unrealistic to try to measure how effectively a system or subsystem is operating without bringing in recall and precision.' How one calculates recall and precision depends on the ordering of the output. There are four possibilities: Unordered output, i.e. output is the retrieved set. Ranked output, with possible ties in ranking. Totally ranked output, i.e. each document has a unique rank. Weighted output, i.e. each document has a weight. If the retrieval set is unordered, then a four-way partition of the full database is made to determine recall and precision: a is the number of relevant and retrieved references. b is the number of non-relevant and retrieved references. C is the number of relevant and non-retrieved references. d is the number of non-relevant and non-retrieved references. n = a + b + C + d is the total number of references in the database. Four measures have been defined in terms of this partition: recall = a/(a + c) precision = a/(a+b) fallout = b/(b+d) generality = (a+c)/(a+b+c+d) U