SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) N-Gram-Based Text Filtering For TREC-2 chapter W. Cavnar National Institute of Standards and Technology D. K. Harman TABLE 1. Selected TREC-2 Test Results Test Query Negation Average At 100 R- Number Note Test Type Thresh. Thresh. Precision At 10 Does Does Precision erirnal official adhoc 80 80 0.1589 0.3960 0.3016 0.2179 eri[OCRerr] official adhoc 75 80 0.1885 0A480 0.3426 0.2494 a30 official ad hoc 70 95 0.1604 0A040 0.3038 0.2198 a31 officialw/ adhoc 70 95 0.1153 OAOOO 0.259 0.1904 X queries __________ __________ a33 spanning adhoc 70 95 0.0734 0.15 0.1484 0.1333 lines a34 span lines; ad hoc 90 95 0.1099 0.314 0.2276 0.1776 exp. decay weighting _______ _______ a35 span lines; ad hoc 80 95 0.1336 0.3000 0.2442 0.2004 exp. decay weighting _______ _______ _______ _______ a35-AP APonly adhoc 80 95 0.2717 0A740 0.2838 0.3140 a36 spanlines; adhoc 70 95 0.0792 0.1780 0.1678 0.1387 exp. decay weighting __________ __________ __________ __________ __________ erirnrl official routing 80 80 0.1219 0.3580 0.2304 0.1814 erimr2 official routing 75 80 0.1415 0A240 0.2524 0.2031 rS official routing 70 95 0.1225 0.3600 0.2310 0.1818 [OCRerr] span lines; routing 80 95 0.1015 0.2880 0.1954 0.1604 exp. decay _______ weighting _______ _______ _______ _______ _______ r7 span lines; routing 70 95 0.0602 0.2200 0.1388 0.1123 exp. decay _______ weighting _______ _______ _______ _______ _______ _______ _______ 4.0 Results To test our system, we ran it a number of times, varying dif- ferent system parameters. After the conference, we were also able to make a few changes and run it again. Table 1 above summarizes some highlights of our results, which show a number of interesting points: The tests numbered erimal, erima2, erimrl and erimr2 were the official results turned in on June 1. The first two were the ad hoc results, and the second two were the routing results. The tests numbered [OCRerr]0, [OCRerr]5, a36, rS, and r7 were all attempts to determine good sernngs for the query thresh- old and negation threshold parameters. Unfortunately, the results from these test runs simply do not provide anywhere near enough data to perform a complete sensi- tivity analysis for these parameters. Also, we noticed in some of our testing on the ThEC-1 queries that there is 177 considerable difference in the optimum values of these thresholds for different topic sets/data set combinations. One of the motivations for using N-gram-based match- ing was that it provides good matching performance in the face of textual errors. To test this idea, we ran test [OCRerr] 1 using deliberately damaged query strings. In this test, we took each query string produced by gen[OCRerr]query, and replaced the third character with the letter "X". This works out to be an effective character recognition error rate of 4.5% over the whole body of query strings. Although the system took a considerable hit in perfor- mance, the interesting thing was that it still functioned at all. Many of the other systems in the ThEC evaluation would most likely have completely failed in an analo- gous test, since they depend heavily on exact word matches. One serious drawback to the original system was that it did not span lines when matching. That is,if the text that