ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. ( 4-5 Fignre 4.2 illustrates this process for two Boolean queries an[OCRerr] the collection [OCRerr]escribed in Figure 4.1(a). Another of the operan[OCRerr] structures useful for document and query representations is the [OCRerr]-dimensional cartesian vector. Table 4.2 characterizes some of the vector comparison operations of interest. Equality, as in the case of set represented operands, is too restrictive a criterion for selecting source documents in response to an input query. The vector difference assigns a vector quantity to each query-document pair, but its magnitude could be a useful matching criterion. In most cases, however, and particularly in the case of the index images derived by a frequency counting technique (see Chapter 2), the information in the vector image of interest is contained in the relative magnitude of its components rather than in their absolute magnitudes. This results from the direct dependence of the absolute magnitude on the number of words in the input text. With this assumption, the angular distance function provides the most suitable matching operation for vector structured information representations. Data representations with structures considerably more complex than set or vector operands have also been considered for 2 automatic document retrieval systems. Hierarchical arrays, tree structures,3 and abstract graphs,4 are among these. With information representations of these types, matching operations are considerably more complex than those described above (see for example Sussenguth, reference 4, for a detailed account of graph matching procedures). The price paid, then, for the additional information which can be