ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Introduction
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-6
experiments reported in this thesis.
3. A Speci£ic Model - The SMART System
The experimental results to be presented here in connection
* with the optimization and evaluation al[OCRerr]rithms were obtained by
simulation, assumi[OCRerr] a speci£ic model £or a mechanized document
* retrieval system. This m6del is based On the SMART retrieval system
developed at Harvard University under the direction 0£ [OCRerr]o£essor [OCRerr]rard
Saltone15 The primary £eatures 0£ those elements 0£ the SMART system 0£
interest here are brie£ly outlined, so that it may become possible to
re£er to' [OCRerr]hem in succeedi[OCRerr] chapters. A more thorou[OCRerr]h outline 0£ the
SMART system is given in Appendix A.
A. [OCRerr]operty Vector Indexing
* -***. * Index images 0£ source documents in the experimental system
* are assumed to be property vector representations 0£ document content.
- - For present purposes it is su££icient to assume that the index image 0£
a re£erence document is an n-dimensional `vector in a property space in
* * * which the weight or magnitude 0£ a given component (or attribute)
* re£lects the de[OCRerr]ee' to which that attribute characterizes the content
* 0£ the source text. Speci£ically, the index i[OCRerr]es experimentally used
were constructed by a thesaurus trans£ormation 0£ the input text. An
attribute 0£ the resulting index space corresponds to a thesaurus
*category (group.o£ semantically related natural langage terms), and
attribute weight is derived £rom the £req'uency 0£ occurrence 0£ the