NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Knowledge-Based Searching with TOPIC chapter J. Lehman C. Reid National Institute of Standards and Technology D. K. Harman No markup language (SGML) interpreter was used during data preparation, and the opfional aiphabefical word list (used only for display) and typographical error index (used almost exclusively for OCR'd data) were not employed. Special indicies such as correlated terms, and paragraph/sentence posifioning were not produced. As the fuzzy proximity operator was used in the tests, only a word position index was produced. No document was divided into logical or arbitrary sections for processing or search result enhancement, although that approach is used in virtually all non-newswire Verity installations. The purpose of logical division (a forerunner of the intelligence available in a standard markup language) is to create domaln-specific logical documents, and therefore to reduce the impact of larger, multi-subject documents on results (they would appear in search results simply because of their breadth of words). 3.2 TOPIC CONSTRUCTION Verity personnel manually constructed the search rules from the subject area descriptions and the training data. No rule developer was identified or chosen as a subject matter expert, and for certain of the contributors, this was their initial interface with using Topic. [Search rule libraries are created by approximately 6% of Topic's user population and the remainder of Topic's users employ the topics developed by others]. On the average, the TREC-2 volunteers were considered novices on the Topic product, particularly the search rule development area. Volunteers were not encouraged to use specific features of the product, and in at least one case, inadequate communication produced potenfially inaccurate search expectations. As search rules were interacfively developed, the rule evidence was automatically indexed for repeated use of the rule. The twenty volunteers each produced between 3 and 8 retrospective and routing queries. The range in time spent on individual query development, and result production was from fifteen minutes to eight hours, over a several week period. The average fime to produce the TREC-2 result, obtained from interviewing the volunteers, was approximately one hour. 3.3 EXPERIMENT PERFORMANCE Typical response time performance on the searches was two seconds per 8000-document partition, or approximately two minutes to search the entire collection. A single term, indexed as rule evidence, was used to search the entire collection, and the 1.1 million document collection was searched in 21 seconds. For routing queries, the score threshold was set to zero; any document containing evidence entered the routing result list. 213 3.4 ANALYSIS OF OFFICIAL RESULTS The post hoc analysis of Topic's TREC-2 results generally found that the Topic system performed well. When compared with other manual systems, the scores are amongst the best. I the few cases where Topic appeared to fail, we have generally been able to identify easily correctable deficiencies, that, had they been noficed during the experiment proper, would have resulted in superior performance by Topic in TREC-2. Based on our analysis, we believe that the prospects for TREC-3 look very bright. Our analysis of selected results from our TREC-2 submissions focuses mainly on the "failure cases" since these are most likely to give us insights in how to improve Topics (and users) performance in future TREC experiments. This also allows us to investigate whether there are any fundamental issues with using Topic to model the information need statements used in ThEC. We analyzed two routing and three ad-hoc topics in detail. Our summary follows. The following general observations applied to all searches: -Adhoc searches were submitted against all three disks, which produced poorer quality results generally, as documents from disc three appeared in some search results.1 -Field value evidence was not used, and in some domains/subject areas, domain knowledge about the sources of information would favor (rank higher) sources with the appropriate use of terminology. (e.g. business sources about financial performance, or foreign datelines have higher likelihood of describing foreign prominent persons/activity, as in topic's 66 or 121) -The queries which used attempted to use nomenclature with hyphens (e.g. M- 1) failed to return an exact match as the hyphen was not included as an indexed character. -The fuzzy proximity (near) operator was undocumented, only one volunteer used it and other users expected sentence / paragraph proximity in their searches. The index did not contain sentence / paragraph positional data, and all uses of sentence or paragraph operators produced erroneous results because the search arbitrarily assigned sentence and paragraph boundaries. 1Reprocessing the adhoc searches against only disks I and *2 produced a numberic result improvement of 0-70 percent, with a *few changes from under the median to over the median.