IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Search Matching Functions chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 111-29 [OCRerr]. SMART Test Results - Weighting Scheme A) Description of Weighting Scheme Weighted document and request vectors, rather than the binary ones presented up to now in considering the overlap and cosine, may be constructed by assigning to each content identifier a 1weight' that reflects the impor- tance or usefulness of that identifier. Since the assignment of weights is ideally done by automatic me[OCRerr]s, the weighting scheme in use with SMART relies initially on frequency information. When suffix t[OCRerr]I and stem dic- tionaries are used ,concepts are weighted entirely by frequency of occurrence of the concepts in the documents (or requests): thus a concept that occurs three times in a document will receive three times the weight of a concept that appears only once. With a thesaurus dictionary in use, or any dictionary that permits a word to appear in more than one concept group, an additional adjustment of the weight reflects word ambiguity. Thus, if a word appears in more than one concept group it is assumed to be ambiguous, and the weight assigned to the concept number representing the ambiguous word is decreased according to the number of conc[OCRerr]pt groups in which the word appears. [OCRerr]any other modi- fications to a weighting procedure of this type can be suggested; for example, where abstracts and titles are used the title words may be given higher weights than the abstract wordse Both the overlap and cosine correlation coefficients may be used with weighted vectors. For example, if a hypothetical request and document are weighted as follows: Concept a b c d ef ghijklmnopqrst u Request Weight 1 2 1 1 13 12 Document Weight - - - 1 31 42137112351111 1