IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
1-15
7. Syntax, in which a syntactic analyzer is used to
ensure acceptable grainmatical relations between
the component words of the phrases. fjhe ouly
retrieval results available have appeared in (6].
Although many versions of dictionaries of these types have been
tested on the different collections with their differing subject areas,
these seven general types describe all the kinds of content analysis pro-
cedures that have been tried at the time of this writing. Some of the
descriptions applied to content analysis procedures by the Cranfield
Project are introduced in part 4C for purposes of comparison. One further
optional part of content analysis is the use of weighted rather than
binary concept identifiers for the documents and requests; a description
of this process appears in Section III.
The search stage requires some procedures for establishing a
coefficient to reflect the match between requests and documents. This is
then used in SMART to order the search output thus producing a ranked list
arranged in decreasing correlation order. Such matching functions are dis-
cussed in Sections III and IV.
The mai'[OCRerr]n input, analysis and search variables are repeated, for con-
venience in Fig. 6. It can be seen that each experimental run must be des-
cribed in terms of four variables: indications of document length and dic-
tionary type are gi[OCRerr]en with each search result1 but use of the numeric
vectors weighting scheme and the cosine matching function is always made
unless otherwise indicated. Since. several versions of some dictionaries
are available and some additional variables not listed in Fig. 6 have also
been investigated, many hundreds of runs can be made before all possible