SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
TIPSTER Panel -- The University of Massachusetts TIPSTER Project
chapter
W. B. Croft
National Institute of Standards and Technology
Donna K. Harman
The University of Massachusetts TIPSTER Project
\V. Bruce Croft
Coiiiputer Science Department
University of' Massachusetts
Amherst, MA. 01003
The TIP STER. project in the Information Retrieval Laboratory of the Computer Science
Department, University of Massachusetts, Amherst (which includes MCC and David Lewis
of the University of Chicago as s[OCRerr])contractors), is focusing on the following goais:
* Improving the effectiveness of information retneval techniques for large, full-text
databases,
* Improving the effectivelLess of r[OCRerr]1ti1ig techniques appropriate for long-term informa-
tion needs, and
* Demonstrating the effectiveijess of these retrieval and routing techniques for Japanese
full-text (1a.tai)a.ses.
Our general approaCh to ([OCRerr]clIievi1'g these go('ds haS been to use improved representations
of text and information lLee(ls ill tiLe fra.iiie[OCRerr]vork of a 1LC[OCRerr]V model of retrieval. Retrieval (and
routing) is vie[OCRerr]ved as a prol) a )ilis tic uLfereuce 1)ro cess which "conip ares" text represent a-
tions based on different fornis of linguistic and statistical evidence to representations of
hiformation needs based on siiiii1[OCRerr]'ir evideiLce froiii natural language queries and user inter-
action. New techniques for 1ear1LiiL[OCRerr] (rclev([OCRerr].nce fee[OCRerr])ack) and extracting term relationships
from text are also being studied. The det;[OCRerr]ils aud evaluation (witiL smaller test databases)
of the new model, known as the uLference net model can be found in other papers [3, 2, 4].
Some of the specific research issues we are addressing are morphological analysis in En-
glish and Japanese, word Sense disan[OCRerr])iguation in English, the use of phrases and other
syntactic structure in English and Ja.l)allese, the use of special purpose recognizers in rep-
resenting documents and queries, analyzing natural language queries to build structured
representations of information nee(lS, learnii[OCRerr] techniques appropriate for routing and struc-
tured queries, and probability estimation techniques for indexing.
Comparing the TIPSTER. experiments to previous IR. experiments done using the stan-
dard test collections (e.g. CACM, CISI NPL, etc.), there are a. number of interesting
differences:
* The size of the corpus is iiiuch larger than previous collections, both in terms of
the number of documents aiLd tiLe ailiount of text. This presents a challenge to the
robustness and efficiciLcy of experiliieiLtal information retrieval systems. Experiments
with indexing for exaiiiple (alL take days ilLSte('L.d of minutes.
* The documents in TIPSTER are nearly all full text, rather than abstracts.
101