ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Introduction
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-8
thereo£, a matching £unction is desired which is independent 0£ the
vector ma[OCRerr]nitudes involved. Under these circumstances it is natural to
assume that the in£ormation carried by the index vector is contained in
its angular position (i.e. its orientation in the property space). The
matching £unction assumed, there£ore, is the angular distance or a
monotonic £unction 0£ this distance between the search request vector
and the source document vector, wherein decreasing distance is assumed
to indicate increasing probability 0£ relevance.
I). Terminolo[OCRerr]
In dealing with the £ore[OCRerr]oing model, the £ollowing
de£initions are required:
1) Let[OCRerr]= [OCRerr] represent the set 0£ source
documents in'the natural langu[OCRerr] comprising the re£erence
collection.
2) [OCRerr] represent a set 0£ sample search
requests in the natural langu[OCRerr]e comprising a test set 0£
retrieval queries.
3) Let T represent the index trans£ormation £rom the natural
language to the index language. The index image 0£ a
document [OCRerr] d[OCRerr]=T(I)[OCRerr])[OCRerr] and the index image 0£ a
search request [OCRerr]. E [OCRerr] is q.=T([OCRerr].). [OCRerr][OCRerr]urther let l--'{d ,d[OCRerr],
1 1 1 2
where alternative models are considered, e.g. lndex images
represented[OCRerr]by `sets rather than vectors, the required notation
will,be introduced following the framework defined here.