Information Retrieval Experiment

IRE Information Retrieval Experiment Introduction chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Introduction Karen Sparck Jones This book is about information retrieval experiment. Documentary infor- mation Systems have changed in many ways in the last 20 years. The rapid growth of specialized literature has encouraged an intellectual development, post-coordination, and a technological development, the use of computers. Many questions about effective methods of document identification, and about efficient methods of document management, have naturally followed These questions about the substantive and the economic aspects of retrieval systems have provoked a whole range of studies. Some of these may be described as investigations; others can properly be described as experiments. They are sometimes associated with generalizations or sometimes, more strongly, with theories or models. These studies taken together have produced some moderately solid results, and have to some extent enlarged our understanding of the way retrieval systems work. For example it appears that good retrieval performance lies in the 4060 per cent recall and precision area; and it seems that some probabilistic models offer valuable insights into system behaviour. But research progress has perhaps been less than might have been expected, and information system practice has in essentials been extremely conservative. The 1958 International Conference on Scientific Information, held in Washington, was widely felt to mark the beginning of a new era in information processing. The novel ideas and techniques to be developed were symbolized by the `auto-abstracts' of conference papers produced by Luhn. By 1978 computers were firmly established in information work, but primarily, and almost entirely, for clerical operations: the crucial information processes of document and request characterization and matching are done by human beings along conventional lines. There is no very good reason to suppose that the conventional methods are best, even in principle, let alone practice. There is in particular no good reason to believe that they are based on any thorough understanding of the nature of information systems. Information systems are manifestly complex. It is by no means certain that information science exists: but it is clear that information processes need scientific study. The requirement for, and role of, experiments in such a study is clear. A good deal of experimental and investigative work has been done in the last two decades; but while results 1