IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-14
following types of dictionaries have been tested in retrieval runs:
1. Suffix `5' only, in which request and document words
are matched as they stand, with only the terminal `5'
denoting plurals being removed. See Section VI.
2. Stems (Null dictionary), in which matching is based on
word stems as identified by an automatic suffix removal
procedure. See Section VI.
3. Thesaurus, where words (mainly stems) are grouped to-
gether on the basis of synonymy1 or partial synonymy,
using human judgment normally. See Section VII.
4. Statistical association (Concon), where synonyms or
related words are identified automatically by using
cooccurrence frequency of words in the collection.
Apart from the control parameters which may be varied,
no human judgment is used. See Section IX.
5. Hierarchies, where subject notions are arranged in a
series of subordinate relations, such as genera and
species, whole and part. Hierarchies tested so far use
thesaurus groups, and texts include some of the many
possible strategies of using hierarchies such as going
11up1t in th[OCRerr] hierarchy to parents, or going "down1' to sons.
See Section VII.
6. Phrases, in which recognition of pairs and larger sets
of words is achieved. Phrases are used in conjunction
with thesaurus groups and phrase recognition takes
place when words from the required thesaurus groups
occur within one sentence of the document or request.
See Section VII.