NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) TIPSTER Panel -- DR LINK's Linguistic-Conceptual Approach to Document Detection chapter E. Liddy S. Myaeng National Institute of Standards and Technology Donna K. Harman 3.a. Subiect Field Coder Testing The Subject Field Coder can be evaluated in its filtering function in the following way: The SF0 vectors of documents in a database are compared to a topic statement vector and, for each topic statement, ranked according to similarity. The question then is, how far down this ranked list would the system need to proceed in order to include all the relevant documents in the set of documents that was passed on to the next system module? This testing procedure has been run using the WSJ and Ziff collections on Disk 1 and Topic Statements 1 to 50 provided by TREC. In the Wall Street Journal database, results showed that on average across the 50 topic statements, all the relevant documents were ranked in the top 28% of the list. Therefore, on average, 72% of the WSJ database did not need to be further processed by the later modules in the system. In the Ziff database, on average across the 50 topic statements, all the relevant documents were ranked in the top 66% of the database, a much poorer result. Therefore, on average, 34% of the Ziff database need not be further processed. When Topic Statements 51 to 100 were searched on the WSJ database, all the relevant documents were ranked in the top 32% of the list. Error analysis of the Ziff results, has revealed the cause of the poorer performance on that database. Namely, for several of the queries, documents were judged relevant only if they contained the particular proper noun mentioned in the topic statement (e.g. OS/2, Mitsubishi, IBM's SAA standards, etc.). Given these types of topic statements, the most appropriate search approach for a system would be keyword matching, which is not at all the type of matching that is done using the SF0 representation. The SFCs represent a document at a higher level of abstraction, not at the keyword level. That is documents which discuss a particular computer, will have a strong weighting of the Data Processing slot on the SF0 vector, but no means for matching on a particular computer name. Therefore, the error analysis showed that the SF0 performance was hampered by its inability to match at the level of a specific company name or product. Fortunately, we do have at hand the means to improve the results, as we are in the process of incorporating a second-level of document ranking using Proper Noun processing algorithms. Results using this extended representation will be available at the eighteenth month TIPSTER meeting. 3.b. Text Structurer Testing The Text Structurer was tested using five of the six evidence sources, as we have not yet implemented the algorithms for incorporating evidence from the Continuation Clues. We tested the Text Structurer on a set of 116 WSJ documents, consisting of several thousand sentences. This first testing resulted in 72% of the sentences being correctly identified. An additional hand simulation of one small, heuristic adjustment was tested and improved the system's performance to 74% of the sentences being correctly identified. A second run of a smaller sample of sentences resulted in 80% correct identification of components for sentences. Ongoing efforts at improving the quality of the evidence sources used by the Text Structurer, plus the incorporation of the Continuation Clue evidence, promise to improve these results significantly. It should be remembered, as well, that the automatic discourse structuring of documents has not been reported elsewhere in the literature, so that this very new type of text processing is in its infancy and likely has much room for future improvements. 127