ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-12
5. Automatic Document Classification
Traditionally the creation 0£ classification schedules aims at
producing a logically consistent, intelligible structuring of huma[OCRerr]
knowledge, wherein the organization and structural relations among
subject categories reflect meaningful relations among the. fields of
discourse which they represent. Research in automatic classification is
generally limited to a much narrower set 0£ goals. In particular,
automatic classification techni[OCRerr]ues have, in general, been based on the
state or content of a given collection rather than on the state of
knowledge in a given field. In this sense, then, the object of
automatic classification has been to generate a set of categories which
are -in some sense optimal for the collection at hand 5
The emphasis of this chapter is placed on the relation of
automatic classification to the problem of search optimization in an
automatic document retrieval system. To this end, then, the basis for
establishing the set of classification categories of a given collection
is specifically identified with increasing the search efficiency of
retrieval operations.
Previous investigations into the feasibility of automatic
classification have regarded the generation of a set of classific&tion
categories, or the automatic assi[OCRerr]nment of documents into an existing
56 -
classification schedule, or both, as primary goals. ` The interest
here, however, is not in the classification system as an end in itself,
7
but rather as an adjunct to an automatic retrieval system. Forpresent
purposes, then, there need be no a priori constraints on the nature of