ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-12 5. Automatic Document Classification Traditionally the creation 0£ classification schedules aims at producing a logically consistent, intelligible structuring of huma[OCRerr] knowledge, wherein the organization and structural relations among subject categories reflect meaningful relations among the. fields of discourse which they represent. Research in automatic classification is generally limited to a much narrower set 0£ goals. In particular, automatic classification techni[OCRerr]ues have, in general, been based on the state or content of a given collection rather than on the state of knowledge in a given field. In this sense, then, the object of automatic classification has been to generate a set of categories which are -in some sense optimal for the collection at hand 5 The emphasis of this chapter is placed on the relation of automatic classification to the problem of search optimization in an automatic document retrieval system. To this end, then, the basis for establishing the set of classification categories of a given collection is specifically identified with increasing the search efficiency of retrieval operations. Previous investigations into the feasibility of automatic classification have regarded the generation of a set of classific&tion categories, or the automatic assi[OCRerr]nment of documents into an existing 56 - classification schedule, or both, as primary goals. ` The interest here, however, is not in the classification system as an end in itself, 7 but rather as an adjunct to an automatic retrieval system. Forpresent purposes, then, there need be no a priori constraints on the nature of