SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
DR-LINK: A System Update for TREC-2
chapter
E. Liddy
S. Myaeng
National Institute of Standards and Technology
D. K. Harman
While the original motivation was to represent documents and topic statements at more conceptual level using RrF
codes, we are also testing the effectiveness Of RIT-based term expansion in IR environments. Using the scheme we
have developed for term clustering using contextual information in the corpus (Myaeng & Li, 1992), we have three
methods to evaluate: RIT-based expansion, term[OCRerr]uster based expansion, and a combination of the two so that we
can elintinate the problem of using a general- puipose thesaunis and the errors made by the termclustering method.
For TWSThR evaluation, we have submitted two sets of four runs: one with R[OCRerr] codes and the other without them.
Each set consists of three runs for different scoring schemes and the last one for the combination of the three runs
which appears to produce the best result inour internal experiinent.
3.ThstRufls
The DR-LINK group elected to put their efforts into continued work for the twenty-four month `HPSThR testing,
and as a result we lost our opportunity to have TREC[OCRerr]ompatible results to discuss at this time. Although our
twenty-four month [OCRerr]STER runs have been submitted, many of our top-ranked documents were not amongst those
submitted by TREC participants, soit is virtually impossible to make even unofficial reports on our system's
perfo[lnance. We trust that in the near future there will be some comparable groups and/or runs to measure ourselves
against after the results from both TIPSThR and TREC-2 are available.
4.Summary
As the above descriptions should convey, we have made a great deal of progress in the development and integration
of the DR-LINK System since TREC-1. Unfortunately, the absence of quantified results of our pelformance limits
our convmcing power. However, we are pleased to have demonstrated that a system implementation of our original
notion of integrating multiple levels of linguistic processing so that retrieval can be conducted at a conceptual rather
than word-based level is nearly achieved.
Many rich research and implementation ideas remain to be explored in all of the DR-LINK modules, particularly
those which have only been in exi[OCRerr]tence for a few months.
The research reported herein was funded by ARPA's IIPSThR Program. We are grateful to both Longrnans' for
access to the machine readable tape of LDOCE and to BBN for the loan of their POST tagger.
Without the untiring efforts of the following individuals, neither the DR-LINK System nor our very promising
results would have been possible: Margot Clark, Bob Del Zoppo, Saleh Elinohammed, Chs Khoo, Tracey Lemon,
Ming Li, Mary McKenna, Ken McVearry, Carin Obad, Woq[OCRerr]in Paik, Ching-cheng Shih, Joc Woeltel, Fdinund Yu,
Ahsan Zhia.
Cook, W (1989). Case Grammar Theory Washington, D.C.: Georgetown University Press.
Harman, D. (Ed.) (1993). Th[OCRerr] first Text REtrieval Conference (IREC-1) National Institute of Standards and
Technology.
Liddy, E.D., Jorgensen, C.L., Sibert, E. & Yu, E.S. (1991). Sublanguage grammar in natural language
processing. Pr[OCRerr]e[OCRerr]gs of RlAO `91 [OCRerr] Barcelona.
Liddy, E.D., Paik, W. & Yu, E.S. (1993). Document filtering using semantic information from a machine readable
dictionary. [OCRerr] of the ACL Workshop on Very Large Cor[OCRerr][OCRerr]ra.
98