SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
N-Gram-Based Text Filtering For TREC-2
chapter
W. Cavnar
National Institute of Standards and Technology
D. K. Harman
TABLE 1. Selected TREC-2 Test Results
Test Query Negation Average At 100 R-
Number Note Test Type Thresh. Thresh. Precision At 10 Does Does Precision
erirnal official adhoc 80 80 0.1589 0.3960 0.3016 0.2179
eri[OCRerr] official adhoc 75 80 0.1885 0A480 0.3426 0.2494
a30 official ad hoc 70 95 0.1604 0A040 0.3038 0.2198
a31 officialw/ adhoc 70 95 0.1153 OAOOO 0.259 0.1904
X queries __________ __________
a33 spanning adhoc 70 95 0.0734 0.15 0.1484 0.1333
lines
a34 span lines; ad hoc 90 95 0.1099 0.314 0.2276 0.1776
exp. decay
weighting _______ _______
a35 span lines; ad hoc 80 95 0.1336 0.3000 0.2442 0.2004
exp. decay
weighting _______ _______ _______ _______
a35-AP APonly adhoc 80 95 0.2717 0A740 0.2838 0.3140
a36 spanlines; adhoc 70 95 0.0792 0.1780 0.1678 0.1387
exp. decay
weighting __________ __________ __________ __________ __________
erirnrl official routing 80 80 0.1219 0.3580 0.2304 0.1814
erimr2 official routing 75 80 0.1415 0A240 0.2524 0.2031
rS official routing 70 95 0.1225 0.3600 0.2310 0.1818
[OCRerr] span lines; routing 80 95 0.1015 0.2880 0.1954 0.1604
exp. decay
_______ weighting _______ _______ _______ _______ _______
r7 span lines; routing 70 95 0.0602 0.2200 0.1388 0.1123
exp. decay
_______ weighting _______ _______ _______ _______ _______ _______ _______
4.0 Results
To test our system, we ran it a number of times, varying dif-
ferent system parameters. After the conference, we were
also able to make a few changes and run it again. Table 1
above summarizes some highlights of our results, which
show a number of interesting points:
The tests numbered erimal, erima2, erimrl and erimr2
were the official results turned in on June 1. The first two
were the ad hoc results, and the second two were the
routing results.
The tests numbered [OCRerr]0, [OCRerr]5, a36, rS, and r7 were all
attempts to determine good sernngs for the query thresh-
old and negation threshold parameters. Unfortunately,
the results from these test runs simply do not provide
anywhere near enough data to perform a complete sensi-
tivity analysis for these parameters. Also, we noticed in
some of our testing on the ThEC-1 queries that there is
177
considerable difference in the optimum values of these
thresholds for different topic sets/data set combinations.
One of the motivations for using N-gram-based match-
ing was that it provides good matching performance in
the face of textual errors. To test this idea, we ran test
[OCRerr] 1 using deliberately damaged query strings. In this
test, we took each query string produced by gen[OCRerr]query,
and replaced the third character with the letter "X". This
works out to be an effective character recognition error
rate of 4.5% over the whole body of query strings.
Although the system took a considerable hit in perfor-
mance, the interesting thing was that it still functioned at
all. Many of the other systems in the ThEC evaluation
would most likely have completely failed in an analo-
gous test, since they depend heavily on exact word
matches.
One serious drawback to the original system was that it
did not span lines when matching. That is,if the text that