IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Test Environment chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 1-15 7. Syntax, in which a syntactic analyzer is used to ensure acceptable grainmatical relations between the component words of the phrases. fjhe ouly retrieval results available have appeared in (6]. Although many versions of dictionaries of these types have been tested on the different collections with their differing subject areas, these seven general types describe all the kinds of content analysis pro- cedures that have been tried at the time of this writing. Some of the descriptions applied to content analysis procedures by the Cranfield Project are introduced in part 4C for purposes of comparison. One further optional part of content analysis is the use of weighted rather than binary concept identifiers for the documents and requests; a description of this process appears in Section III. The search stage requires some procedures for establishing a coefficient to reflect the match between requests and documents. This is then used in SMART to order the search output thus producing a ranked list arranged in decreasing correlation order. Such matching functions are dis- cussed in Sections III and IV. The mai'[OCRerr]n input, analysis and search variables are repeated, for con- venience in Fig. 6. It can be seen that each experimental run must be des- cribed in terms of four variables: indications of document length and dic- tionary type are gi[OCRerr]en with each search result1 but use of the numeric vectors weighting scheme and the cosine matching function is always made unless otherwise indicated. Since. several versions of some dictionaries are available and some additional variables not listed in Fig. 6 have also been investigated, many hundreds of runs can be made before all possible