IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Analysis of the Documentation Requests
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
X-7
documents use the phrase `1abstract mathematics't. Only in one case was "abstract"
used in a sense other than 11document summary", namely, in 1tabstract trees
Thteraction with the requestor seems necessary here, or a demand to list at
least some example of 1'abstract mathematics11.
Request All reveals the problem of ordinarily common words being
used in a technical sense. Words such as "evaluation11 and 11needtt are re-
legated to the conimon word list when the thesaurus is used, thus leaving a
request specification only in terms of high-frequency words in the collection.
The stem dictionary uses both words and gives a better performance result;
however, since such words frequently occur in non-technical senses, two of the
four relevant documents receive poor (lower than 15) rank positions. There
seems to be no way of coping with such problems except to get requestors to
supply alternative and less ambiguous words where possible.
Another example of this kind is in request Al3, in which "criteria",
"objective" and "evaluation" appear. In this case inclusion of these request
words in the stem dictionary results in good rank positions for S of the 6
relevant documents. Where several such ambiguous words occur, the co-occurrence
of all of them in a document in the incorrect sense is less likely; an improved
type of phrase dictionary may overcome the problem.
A problem of synonym recognition is raised by request Bl3. The
phrase "physical sciences" is really ambiguous and not ver[OCRerr] well chosen, since
examination of the relevant documents reveals that it covers notions such
as "materials", "chemistry", "engineering", "technology", "missiles and
space technology" and "environmental engineering". The use of such wide
ranging relations in a thesaurus concept would be reflected by a concept number
with very many corresponding words; this would not serve all types of requests
equally well, and would in any case require some recognition of phrases rather