ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Synopsis
synopsis
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
sY[OCRerr]QPsIs
A model for document retrieval systems consisting of the
function&i elements: indexing, search request formulation, and
request-document matching is examined in this thesis. Analysis of
the request formulation function leads to the definition of an
optimality criterion and to a request optimization algorithm suitable
for use in a system environment which allows iterative searching and
real time user-system interaction. The optimality criterion has
applicability to the evaluation of index langnages, and generally to
the desi[OCRerr]i of evaluation tests for document retrieval systems.
Investiga-[OCRerr]ion of the. request-document matching leads to a novel
automatic classification algorithm applicable to metric comparison
measures, and useful for establishing an efficient storage
organization. Finally the statistical basis for the evaluation of
document retrieval systems is reviewed and some novel performance
measures are proposed which are particularly suited to systems which
induce a retrieval ordering on the members of the searched collection.
Chapter 1 is introductory in nature and attempts to define
the area of discourse and[OCRerr]its relation to the[OCRerr]general field of
information retrieval. A[OCRerr]general model for document retrieval systems
is introd[OCRerr]ced, and the basic functional elements of this model are
briefly outlined. This material draws heavily on the work of Salton
and the S[OCRerr]T automatic document retrieval project, as well as the
general literature of the field.
xv