<DOC> <DOCNO> IRE </DOCNO> <TITLE> Information Retrieval Experiment </TITLE> <SUBTITLE> The Smart environment for retrieval system evaluation-advantages and problem areas </SUBTITLE> <TYPE> chapter </TYPE> <PAGE CHAPTER="15" NUMBER="326"> <AUTHOR1> Gerard Salton </AUTHOR1> <PUBLISHER> Butterworth & Company </PUBLISHER> <EDITOR1> Karen Sparck Jones </EDITOR1> <COPYRIGHT MTH="" DAY="" YEAR="1981" BY="Butterworth & Company"> All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. </COPYRIGHT> <BODY> 326 The Smart environment for retrieval system evaluation In summary one concludes that many of the Smart procedures have interesting theoretical properties, in addition to proving effective under various experimental conditions. The intellectual framework under which the Smart system operates makes it easy to add new procedures and to extend operations in various directions. In quite a few cases it becomes possible to prove the usefulness of the techniques formally as well as experimentally. It remains to examine the appropriateness of undertaking a long-term project such as Smart in the retrieval area. This is done in the final section of this report. 15.5 Concluding remarks It is hardly necessary to point out that the Smart system design carries with it great advantages if one aims at constructing a flexible environment for retrieval system experimentation. Whereas in normal environments, it becomes necessary to retool to begin each individual experiment, the Smart system has made it possible to carry out hundreds of different experiments without substantial overhead or expense in program modification or collection preparation. Such a flexible environment is to some extent bigger than the sum of its parts: after using the system for a while one sees things fall into place often one can anticipate the evaluation results before actually seeing them, and one obtains an intuitive feeling for the operations of a retrieval system. It is then possible to obtain substantial returns from a continuing experimental project, in return for the substantial investment that is necessary in building and maintaining the system over many years. Normally, an experimental system is considered useful because the experimental results can help confirm a variety of formal theories and abstract models for a given process or system of procedures. The Smart system experiments have in fact been initiated in an attempt to confirm a variety of theories about the content analysis problem. When an experimental system is sufliciently flexible it may also be useful in reverse . That is, the test results can help in formulating theories, and formal proofs can sometimes be generated to describe precisely the conditions under which a given experimental process is expected to be useful. Formal results obtained after the fact have thus helped in rendering the Smart test results plausible in areas such as term frequency weighting, term precision weighting, document clustering, and relevance feedback. In addition the Smart system results have led at least to a rethinking about, and sometimes to actual modifications of existing retrieval procedures. Since so many different methodologies were actually subjected to intensive tests in areas such as document input, indexing, classification, document-query comparison, output ranking and display, query reformulation, and so on, the Smart system has something to say in most areas relating to information system design. As a result selected methods that are easy to implement and apparently most productive (term weighting, relevance feedback, etc.) have in fact found their way into a number of operating environments. What about the drawbacks of a large and continuing experimental project? Obviously one must be careful about the initial design and about the claims one makes about the results. It is easy to go off on a tangent and to get stuck </BODY> </PAGE> </DOC>