SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Multilevel Ranking in Large Text Collections Using FAIRS chapter S-C. Chang H. Dediu H. Azzam M-W. Du National Institute of Standards and Technology Donna K. Harman should be customized. After examining a preview or the full text of the highly ranked records, the user can then revise the query or even change the ranking strategy. 2.1.6 Synonyms A synonym definition capability (glossary) is also avail- able within FAIRS. Given a word, FAIRS will retrieve instances of that word, as well as instances of its syn- onyms. The option can be invoked to broaden a query. Users can build their own vocabulary or invoke and mod- ify the system-wide synonym dictionary. This can be used to ease the problem of different word usages from differ- ent people. A very fast elastic string matching algorithm2 [5] is being evaluated for inclusion among the query expansion fea- tures of FAIRS. 2.1.7 Displaying Records When a request is made, users will always be presented with the retrieved records in their full-text form. Further- more, FAIRS provides several ways to associate non-text information with each record. There are basically two types of links: implicit and explicit. Implicit links use source information as-is while explicit links involve spe- cial fields embedded in the source text. Implicit links are useful in situations where an implied one4o-one mapping may be established between records and image files. Explicit links may be used to express one-to-many rela- tions between records and other media. 2.1.8 Ranking Gne of the most interesting aspects of FAIRS is its uncon- ventional ranking scheme to determine the relative rele- vance of retrieved records. The ranking scheme is designed to mimic the human relevancy judgement pro- cess. When a person is asked to determine the relative relevance between two records he is likely to first weigh them using a set of criteria. If the two records have the same weight with the set of criteria, a secondary set of criteria may be used to differentiate them, and so on. The criteria used can be highly heuristic. The adaptation of this "multilevel ranking scheme" has been filed with the Patent office in the United States. To enable FAIRS to use free association in place of Bool- ean semantics, a multilevel ranking model [1,2] for full- 2. Patent Pending 331 text information retrieval has been developed and imple- mented. FAIRS ranks records with respect to a particular query according to a set of rules. The default rules consist of six attributes in six levels. The six attributes are the importance, popularity, frequency, location of a search word, and record size, and record JD of the record it occurs in. Each attribute may have either positive, negative or no impact (neutral) on the relevance judgement of a record. Such arrangement also guarantees the automatic consider- ation of coverage (i.e., percentage of different query words covered by record), which is next to impossible to imple- ment in a Boolean environment [1,2]. The ranking rules of FAIRS are always accessible and modifiable by the user. Descriptions of the attributes chosen for FAIRS follow: FREQUENCY: The number of occurrences of the key- word in the record. This attribute may be used to reflect interest in finding records with more repetitions of a given term. That is, when set to have a positive impact, the more instances of a term in a record, the more relevant the record. Therefore, the record with the higherfrequency of a term is more likely to be retrieved. iMPORTANCE: FAIRS provides the searcher the ability to assign an arbitrary weight (importance) to each query key- word; thus the user has additional control over how records are retrieved. For example, specifying brea[OCRerr]ast:4 assigns a weight of 4 to the term breakfast in a query. In general, keyword weighting allows the searcher to change how FAIRS sorts records in order to identify the most rele- vant ones. If a keyword is not weighted, FAIRS supplies a default weight which is in reverse proportion to its input position in the query (words that came to mind first deserve more weight). Therefore, if the searcher chooses to weight some or all keywords in a query with a range of weights, records strong in the heavily-weighted keywords are ranked before others. This attribute is defaulted to have a positive impact on the relevance judgement. POPULARITY: This is the number of times a term occurs in the entire collection, as opposed to the number of times it appears in the retrieved record. For example, if the word software appeared, at least once, in 15 records in a collec- tion, its popularity is considered to be 15. This attribute is usually used in the negative sense and, by default, FAIRS assumes that the more popular a term is, the less effective it is in retrieval. REC_ID: The record ID is the location of a record in the collection. It may indicate the age of the record.This is also useful when the records are arranged according to their degree of significance (in either increasing or decreasing order.) This attribute is a good example for