ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Synopsis synopsis Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. sY[OCRerr]QPsIs A model for document retrieval systems consisting of the function&i elements: indexing, search request formulation, and request-document matching is examined in this thesis. Analysis of the request formulation function leads to the definition of an optimality criterion and to a request optimization algorithm suitable for use in a system environment which allows iterative searching and real time user-system interaction. The optimality criterion has applicability to the evaluation of index langnages, and generally to the desi[OCRerr]i of evaluation tests for document retrieval systems. Investiga-[OCRerr]ion of the. request-document matching leads to a novel automatic classification algorithm applicable to metric comparison measures, and useful for establishing an efficient storage organization. Finally the statistical basis for the evaluation of document retrieval systems is reviewed and some novel performance measures are proposed which are particularly suited to systems which induce a retrieval ordering on the members of the searched collection. Chapter 1 is introductory in nature and attempts to define the area of discourse and[OCRerr]its relation to the[OCRerr]general field of information retrieval. A[OCRerr]general model for document retrieval systems is introd[OCRerr]ced, and the basic functional elements of this model are briefly outlined. This material draws heavily on the work of Salton and the S[OCRerr]T automatic document retrieval project, as well as the general literature of the field. xv