SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
TIPSTER Panel -- DR LINK's Linguistic-Conceptual Approach to Document Detection
chapter
E. Liddy
S. Myaeng
National Institute of Standards and Technology
Donna K. Harman
Conceptual Graph Matcher Testing
As the first prototype, the CG Matcher was implemented with a set of scoring heuristics whose
theoretical basis is on the Dempster-Shafer theory of evidence. The score for a document is
computed progressively from node-level evidence through multiple stages to generate a score
for text units of different granularities.
In order to test the feasibility of the prototype module, we have run it with manually generated
OGs for twenty documents and five topic statements for which we have relevance judgments.
While it is premature to draw any conclusions on the efficacy of the matching algorithm and the
heuristics, mainly due to the size of the data set and the stage of the development, the results
were encouraging and have provided us with much insight on how the scoring heuristics need to
be tuned.
Conclusions
This paper on the DR-LINK system should be considered a report on a work in progress, since
we did not have a fully devebped system at the time of the TREC testing. However, we do believe
that the three system components which were tested perform quite respectably, given their
innovativeness. Continued development and feedback from the TREC results will provide much
more refined versions of these system modules. In addition, two system modules remain to be
developed and the full system, which is quite synergistic in its approach to achieving its goals,
remains to be integrated and tested as a full system. The Relation Concept Detector and
Conceptual Graph Generator modules are being implemented in tandem and, when completed, will
make the DR-LINK system fully operational. Full system testing will be conducted for the
eighteen month TIPSTER testing.
In the interim, our goal in this paper has been to describe the five unique modules which
comprise DR-LINK and which, in combination, promise to provide a full system which has the
necessary filtering power to make later processing more accurate and the depth of linguistic
processing required to provide real conceptual level matching and retrieval.
References
Halliday, M. A. K. & Hasan, R. (1976). Cohesion in English. London, Longmans.
Liddy, E.D. & Paik, W. (1992). Statistically-Guided word sense disambiguation. In Prnceedinos
of MAI Fall Symposium Series: Probabilistic aooroaches to natural language. Menlo Park,
CA: AAAI.
Liddy, E.D., Paik, W., Mcvearry, K. & Yu, E. (In press). Automatic discourse-level structuring
of newspaper texts: Empirical testing of a model.
Liddy, E.D., Paik, W. & Woelfel, J. (1992). Use of subject field codes from a machine-readable
dictionary for automatic classification of documents. Proceedings of 3rd ASIS Classification
Research Workshop.
Meteer, M., Schwartz, R. & Weischedel, R. (1991). POST: Using probabilities in language
processing. Proceedings of the Twelfth International Conference on Artificial Intelligence.
Sydney, Australia.
Myaeng, S. H. (1992) Using conceptual graphs for information retrieval: a framework for
representation and flexible inferencing. Proceedings of Symposium on Document Analysis
and Information Retrieval, Las Vegas, March 16-18.
Myaeng, S. H. & Khoo, C. (1992). On uncertainty handling in plausible reasoning with
conceptual graphs. Proceedings of 7th Workshop on Conceptual Graphs, Las Cruces, NM,
July, 1992.
128