ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Indexing Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
2-5
cate[OCRerr]ories or ooncept codes. Thus a set 0£ semantically associated
natural lang[OCRerr][OCRerr]e terms comprised 0£ synonyms, £or example, can be
mapped into a single element in the index langu[OCRerr]e; or a single
natural langu[OCRerr]e term which has several connotations can be identified
with a set 0£ elements in[OCRerr]the index langu[OCRerr]e (homonyms mi[OCRerr]ht be
treated in this manner). Figure 2.1 provides an illustration by means
0£ an excerpt £rom the S[OCRerr]ART system thesaurus.
The notion 0£ a semantically based transformation on a set 0£
reco[OCRerr]nizable (by machine) linguistic £eatures (word or stem types,
phrases, etc.) can be generalized to include a variety 0£ the
13
associations which such elements possess. The index transformation
may be described in this case by considering a multi-stage mapping.
The £irst step consists in mapping the document into the set 0£ basic
elements which describe it, e.g. into the set 0£ word types it contains.
The second step is a transformation £rom these elements into a space
0£ synonymous term groups i.e. into thesaurus categories. (The
thesaurus mapping described above consists in applying these two basic
transformations.) Additional transformation stages may also be de£ined.
Thus generic (inclusion). relations exist among semantic elements and
these may be used to de£ine a set 0£ hierarchies. A number 0£
transformation can be de£ined based on a set 0£ such relations; thus
a term which includes or. which is included by a given term may be[OCRerr]
added to or may replace the related term in the document image. The
index image 0£ a document, there£ore, can[OCRerr]be modified to contain terms
which are generically related to those detected, but not explicitly
present in the input text. Relation[OCRerr]s among index terms other than
)