SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) TIPSTER Panel -- The University of Massachusetts TIPSTER Project chapter W. B. Croft National Institute of Standards and Technology Donna K. Harman The University of Massachusetts TIPSTER Project \V. Bruce Croft Coiiiputer Science Department University of' Massachusetts Amherst, MA. 01003 The TIP STER. project in the Information Retrieval Laboratory of the Computer Science Department, University of Massachusetts, Amherst (which includes MCC and David Lewis of the University of Chicago as s[OCRerr])contractors), is focusing on the following goais: * Improving the effectiveness of information retneval techniques for large, full-text databases, * Improving the effectivelLess of r[OCRerr]1ti1ig techniques appropriate for long-term informa- tion needs, and * Demonstrating the effectiveijess of these retrieval and routing techniques for Japanese full-text (1a.tai)a.ses. Our general approaCh to ([OCRerr]clIievi1'g these go('ds haS been to use improved representations of text and information lLee(ls ill tiLe fra.iiie[OCRerr]vork of a 1LC[OCRerr]V model of retrieval. Retrieval (and routing) is vie[OCRerr]ved as a prol) a )ilis tic uLfereuce 1)ro cess which "conip ares" text represent a- tions based on different fornis of linguistic and statistical evidence to representations of hiformation needs based on siiiii1[OCRerr]'ir evideiLce froiii natural language queries and user inter- action. New techniques for 1ear1LiiL[OCRerr] (rclev([OCRerr].nce fee[OCRerr])ack) and extracting term relationships from text are also being studied. The det;[OCRerr]ils aud evaluation (witiL smaller test databases) of the new model, known as the uLference net model can be found in other papers [3, 2, 4]. Some of the specific research issues we are addressing are morphological analysis in En- glish and Japanese, word Sense disan[OCRerr])iguation in English, the use of phrases and other syntactic structure in English and Ja.l)allese, the use of special purpose recognizers in rep- resenting documents and queries, analyzing natural language queries to build structured representations of information nee(lS, learnii[OCRerr] techniques appropriate for routing and struc- tured queries, and probability estimation techniques for indexing. Comparing the TIPSTER. experiments to previous IR. experiments done using the stan- dard test collections (e.g. CACM, CISI NPL, etc.), there are a. number of interesting differences: * The size of the corpus is iiiuch larger than previous collections, both in terms of the number of documents aiLd tiLe ailiount of text. This presents a challenge to the robustness and efficiciLcy of experiliieiLtal information retrieval systems. Experiments with indexing for exaiiiple (alL take days ilLSte('L.d of minutes. * The documents in TIPSTER are nearly all full text, rather than abstracts. 101