SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
TIPSTER Panel -- DR LINK's Linguistic-Conceptual Approach to Document Detection
chapter
E. Liddy
S. Myaeng
National Institute of Standards and Technology
Donna K. Harman
3.a. Subiect Field Coder Testing
The Subject Field Coder can be evaluated in its filtering function in the following way: The SF0
vectors of documents in a database are compared to a topic statement vector and, for each topic
statement, ranked according to similarity. The question then is, how far down this ranked list
would the system need to proceed in order to include all the relevant documents in the set of
documents that was passed on to the next system module? This testing procedure has been run
using the WSJ and Ziff collections on Disk 1 and Topic Statements 1 to 50 provided by TREC. In
the Wall Street Journal database, results showed that on average across the 50 topic statements,
all the relevant documents were ranked in the top 28% of the list. Therefore, on average, 72%
of the WSJ database did not need to be further processed by the later modules in the system. In
the Ziff database, on average across the 50 topic statements, all the relevant documents were
ranked in the top 66% of the database, a much poorer result. Therefore, on average, 34% of the
Ziff database need not be further processed. When Topic Statements 51 to 100 were searched on
the WSJ database, all the relevant documents were ranked in the top 32% of the list.
Error analysis of the Ziff results, has revealed the cause of the poorer performance on that
database. Namely, for several of the queries, documents were judged relevant only if they
contained the particular proper noun mentioned in the topic statement (e.g. OS/2, Mitsubishi,
IBM's SAA standards, etc.). Given these types of topic statements, the most appropriate search
approach for a system would be keyword matching, which is not at all the type of matching that
is done using the SF0 representation. The SFCs represent a document at a higher level of
abstraction, not at the keyword level. That is documents which discuss a particular computer,
will have a strong weighting of the Data Processing slot on the SF0 vector, but no means for
matching on a particular computer name. Therefore, the error analysis showed that the SF0
performance was hampered by its inability to match at the level of a specific company name or
product. Fortunately, we do have at hand the means to improve the results, as we are in the
process of incorporating a second-level of document ranking using Proper Noun processing
algorithms. Results using this extended representation will be available at the eighteenth month
TIPSTER meeting.
3.b. Text Structurer Testing
The Text Structurer was tested using five of the six evidence sources, as we have not yet
implemented the algorithms for incorporating evidence from the Continuation Clues. We tested
the Text Structurer on a set of 116 WSJ documents, consisting of several thousand sentences.
This first testing resulted in 72% of the sentences being correctly identified. An additional hand
simulation of one small, heuristic adjustment was tested and improved the system's
performance to 74% of the sentences being correctly identified. A second run of a smaller
sample of sentences resulted in 80% correct identification of components for sentences. Ongoing
efforts at improving the quality of the evidence sources used by the Text Structurer, plus the
incorporation of the Continuation Clue evidence, promise to improve these results significantly.
It should be remembered, as well, that the automatic discourse structuring of documents has not
been reported elsewhere in the literature, so that this very new type of text processing is in its
infancy and likely has much room for future improvements.
127