IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Thesaurus, Phrase and Hierarchy Dictionaries
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Vil-si
Examples of the improvement given to two of the non-staff requests
by the hierarchy are given in Fig. 31. Request Q006 asks for documents
about information retrieval using computers, and concept 26 "retrieval"
is linked in the hierarchy to "parent" concept 200 "data-processing",
"data handling", etc. All six relevant documents also contain concept
200 as a result of the hierarchy expansion; one document did not originally
contain concept 26, and so obtained concept 200 from "sums" other than
concept 26; the other documents achieved high weights on concept 200
through a similar connection. Thus, concept 200 is in the main respon-
sible for the sharp improvement in performance, mainly through the
mechanism of increasing the weight given to the notion vital to the request.
Request Q015 has six concepts in the request when the thesaurus
is used, and this is expanded to twenty-six when the hierarchy "all"
relation is in use. Document 200 has a greatly improved rank on
hierarchy, because all but two of the additional request concepts added
by hierarchy are matched, thus giving a total match of 23 out of 26 on
hierarchy, although 5 out of 6 matches were achieved by thesaurus. In
general, it is clearly unusual for a document to match with nearly all
the hierarchy expansions in a given request, and the case of document 200
may be a special one. Documen[OCRerr]l06 and 382 both exhibit cases of hierarchy
acting as a recall device, since request concepts 383 ("Transcendental")
and 618 ("Function") do not match with the thesaurus, but do match with
hierarchy through "brothers" and "cross reference" relations.
This points to the probably reason why the hierarchy as tested
is not generally effective: because the use of thesaurus groups to build