SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Multilevel Ranking in Large Text Collections Using FAIRS chapter S-C. Chang H. Dediu H. Azzam M-W. Du National Institute of Standards and Technology Donna K. Harman showing there is no perfect ranking rules that work in all situations. This attribute may have either a positive or neg- alive impact, depending on user intention. Although there is no obviously better default setting for this attribute, we chose positive to be the default, so as to give priority to the later records as they are likely to be more timely. REC_SIZE: The total word count of the record. This attribute may be used to counter (normalize) the size advantage that a larger record may have over smaller ones during rank judgement. A larger record may have more keywords simply because it contains more words. It is therefore we set its default to have a negative impact on the relevance judgement. Of course, when records are of similar size, this attribute will have minimal effect, and should probably be disabled to improve response time. WORD_LOC: The location of the first occurrence of the keyword in the record. A negative setting (the default) of this attribute assumes that important words appear in the beginning of a collection, or record. For example, head- ings or tides which contain keywords describing the con- tents of a document, usually appear at the beginning. Of course, this depends solely on how the contents of the information are organized. This serves as another example of the context-sensitivity of the ranking process. The following table shows the default ranking rules: Table 1: Ranking Attributes Settings for TREC92 level Imp Pop Freq Size ID Loc 1 --- --- --- 3 --- neg - - --- --- 4 pos neg pos neg - - 6 pos The first level, having no attribute values, automatically accounts for coverage if the weight computation is done using Method 1, as described below in section 2.1.9. To perform ThEC92 experiments, we changed the ranking rules as follows: Size was introduced on level one to bal- ance the effect of the very large records found in the Fed- eral Register. Consequently, we also had to specify impacts for the Importance, Popularity and Frequency attributes on the first level since they are the most impor- tant criteria for ranking. The following table shows the TREC92 ranking rules: 332 Table 2: Ranking Attributes Settings for TREC92 level Imp Pop Freq Size ID Loc 1 pos neg pos neg - - 2 pos neg - neg - 3 pos neg pos neg - 5 --- --- --- --- pos 2.1.9 Weight Computation Two methods were available to compute the weight of a document: U Method 1 The weight of a retrieved record r at level l is determined by the following formula: K FA[OCRerr] W1[r] = Ie[i]Ha1,[OCRerr]'['1I £=1 L[OCRerr]=1J Where: e[i] = 1 if the ith keyword exists in the record and 0 otherwise, = The value of the attributej9 A Number of attributes (currently 6,) K = Total number of query keywords, s1,1 = 1 if the value of attributej is positive, 0 if the value of attributej is neutral, -1 if the value of attributej is negative. In other words, the weight at a certain level is the sum of the product of the attributes at that level. This weight com- putation method automatically calculates coverage since for each keyword e[i], the product of attributes is never 0. The value of the attribute is configured by the user either before running FAIRS, or before delivering the query. To set the value before running FAIRS, a file must be created containing the initial values. A default value is set during system start-up. * Method2 One disadvantage of method 1 is that it lacks a common reference scale that evenly distributes the influence of the attributes on the weight of the document. For example, consider a textbase with one very large record, record