SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Multilevel Ranking in Large Text Collections Using FAIRS
chapter
S-C. Chang
H. Dediu
H. Azzam
M-W. Du
National Institute of Standards and Technology
Donna K. Harman
Multilevel Ranking m Large Text
Collections Using FAIRS
S-C. Chang, H. Dediu, H. Azzam, M-W. Du
GTE Laboratories
Abstraci
2.0 System Overview
A description of a general-purpose multilevel ranking
information retrieval prototype is presented. The methods
used in weighting and ranking the retrieved documents are
discussed. Experiments with the TREC92 collection of
text and queries have been conducted without manual pre-
processing. Initial results have shown the multilevel rank-
ing scheme to be highly competitive in precision and recall
relative to other ranking strategies.
1.0 Introduction
Information retrieval research at GTh Laboratories has led
to the development of a prototype system called FAIRS
(Friendly Adaptable Information Retrieval System).
FAIRS has evolved into a functional and flexible system
currently running on SunOS, HPILJX, Ultrix, AIX, VMS
and PC platforms. It has been used in environments as
diverse as literature searching, library operations, research
and development, customer support, market analysis and
management. FAIRS is being further tested as a retrieval
engine for very large collections of text, such as those pre-
sented by the TREC92 collection and wide-area distrib-
uted collections.
FAIRS is designed to minimize user effort in the prepara-
tion of text, the learning of query syntax while providing a
user-modifiable multilevel ranking scheme1. Experiments
with the TREC92 collections of text and queries have been
conducted with no human intervention in the processing of
either text or queries. The results of experiments against
the collection of Wall Street Journal articles are listed in
Section 3A.
1. PatentPending
329
FAIRS uses pure text as its information base, while
allowing flexible links into non-textual information.
FAIRS extracts information out of an unstructured,
amorphous collection of data, in four main steps:
1. First, it partitions (logically) a raw text file or a col-
lection of such files into retrievable record units.
This simple record partitioning is necessary and suf-
ficient for indexing to begin. The goal is to use
source information as-is [I].
2. Second, FAIRS automatically constructs an index. A
feature exists where deletions are permitted from an
index. Statistics on the collections can also be gener-
ated at this time. Such statistics are used in normaliz-
ing the weighting of retrieved documents.
3. The third step involves the user queries. Queries are
accepted and interpreted in an intelligent and sensi-
tive manner [1,2]. A flexible approach to the under-
standing of the query is essential to providing good
responses.
4. Finally, once the query is processed, the relevant hits
are retrieved quickly and displayed in an ordered list
ranked according to a relevance measure. The
records corresponding to the hits can then be viewed,
printed or mailed in their entirety on demand.
2.1 Char8cterIstIcs
In FAIRS, both the responses to information requests
and the way relevance is determined can be customized
by the user. FAIRS can provide a tool for decision mak-
ing by presenting to the user all the relevant facts in an
elegant and timely fashion. There are some interesting
and novel features of FAIRS and research issues associ-
ated with each of these strategies. They will be dis-
cussed in sequence.