NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) DR-LINK: A System Update for TREC-2 chapter E. Liddy S. Myaeng National Institute of Standards and Technology D. K. Harman While the original motivation was to represent documents and topic statements at more conceptual level using RrF codes, we are also testing the effectiveness Of RIT-based term expansion in IR environments. Using the scheme we have developed for term clustering using contextual information in the corpus (Myaeng & Li, 1992), we have three methods to evaluate: RIT-based expansion, term[OCRerr]uster based expansion, and a combination of the two so that we can elintinate the problem of using a general- puipose thesaunis and the errors made by the termclustering method. For TWSThR evaluation, we have submitted two sets of four runs: one with R[OCRerr] codes and the other without them. Each set consists of three runs for different scoring schemes and the last one for the combination of the three runs which appears to produce the best result inour internal experiinent. 3.ThstRufls The DR-LINK group elected to put their efforts into continued work for the twenty-four month `HPSThR testing, and as a result we lost our opportunity to have TREC[OCRerr]ompatible results to discuss at this time. Although our twenty-four month [OCRerr]STER runs have been submitted, many of our top-ranked documents were not amongst those submitted by TREC participants, soit is virtually impossible to make even unofficial reports on our system's perfo[lnance. We trust that in the near future there will be some comparable groups and/or runs to measure ourselves against after the results from both TIPSThR and TREC-2 are available. 4.Summary As the above descriptions should convey, we have made a great deal of progress in the development and integration of the DR-LINK System since TREC-1. Unfortunately, the absence of quantified results of our pelformance limits our convmcing power. However, we are pleased to have demonstrated that a system implementation of our original notion of integrating multiple levels of linguistic processing so that retrieval can be conducted at a conceptual rather than word-based level is nearly achieved. Many rich research and implementation ideas remain to be explored in all of the DR-LINK modules, particularly those which have only been in exi[OCRerr]tence for a few months. The research reported herein was funded by ARPA's IIPSThR Program. We are grateful to both Longrnans' for access to the machine readable tape of LDOCE and to BBN for the loan of their POST tagger. Without the untiring efforts of the following individuals, neither the DR-LINK System nor our very promising results would have been possible: Margot Clark, Bob Del Zoppo, Saleh Elinohammed, Chs Khoo, Tracey Lemon, Ming Li, Mary McKenna, Ken McVearry, Carin Obad, Woq[OCRerr]in Paik, Ching-cheng Shih, Joc Woeltel, Fdinund Yu, Ahsan Zhia. Cook, W (1989). Case Grammar Theory Washington, D.C.: Georgetown University Press. Harman, D. (Ed.) (1993). Th[OCRerr] first Text REtrieval Conference (IREC-1) National Institute of Standards and Technology. Liddy, E.D., Jorgensen, C.L., Sibert, E. & Yu, E.S. (1991). Sublanguage grammar in natural language processing. Pr[OCRerr]e[OCRerr]gs of RlAO `91 [OCRerr] Barcelona. Liddy, E.D., Paik, W. & Yu, E.S. (1993). Document filtering using semantic information from a machine readable dictionary. [OCRerr] of the ACL Workshop on Very Large Cor[OCRerr][OCRerr]ra. 98