SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Recent Developments in Natural Language Text Retrieval
chapter
T. Strzalkowski
J. Carballo
National Institute of Standards and Technology
D. K. Harman
The table beloW illustrates the problem of weight-
ing plirasal terms Using topic 101 and a relevant docu-
ment (WSJ870226-0091).
Topic 101 matches W5J870226-0091
duplicate terms not shown
TERM TE JDF
sdi 1750
eris 3175
star 1072
wars 1670
laser 1456
weapon 1639
missile 872
NEW WEIGHT
1750
3175
1072
1670
1456
1639
872
space+base 2641 2105
interceptor 2075 2075
exoatmospheric 1879 3480
system+defense 2846 2219
reentry+vehicle 1879 3480
initiative+defense 1646 2032
svstem+interceptor 2526 3118
DOCRANK 30 10
changing the weighting scheme for compound terms,
along with other minor improvements (such as expanding
the stopword list for topics, or correcting a few parsing
bugs) has lead to the overall increase of precision of
nearly 20% over our official [OCRerr]II[OCRerr]EC-2 ad-hoc results.
Table 4 summarizes these new runs for queries 101-150
against WSJ database. Simllar improvements have been
obtained for queries 51-100.
The results of the routing runs against SJMN data-
base are somewhat more troubling. Applying the new
weighting scheme we did see the average precision
increase by some 5 to 12% (see column 4 in Table 3), but
the results remain far below those for the ad-hoc runs.
Direct runs of queries 51-100 against SJMN database
produce results that are about the same as in the routing
runs (which may indicate that our routing scheme works
fine), however the same queries run against WSJ data-
base have retrieval precision some 25% above SJMN
runs. This may indicate some problems with SJMN data-
base or the relevance judgements for it.
[OCRerr]OT SPOT' RETRIEVAL
Another difficulty with frequency-based term
weighting arises when a long document needs to be
retrieved on the basis of a few short relevant passages. If
the bulk of the document is not direcdy relevant to the
query, then there is a strong possibility that the document
will score low in the final ranling, despite some strongly
relevant material in it. This problem can be dealt with by
subdividing long documents at paragraph breaks, or into
approximately equal length fragments and indexing the
database with respect to these (e.g., Kwok 1993). While
134
Run nyuirl nyuirla nyuir2 nyuir2a
Name ad-hoc ad-hoc ad-hoc ad-hoc
Queries 50 50 50 50
Tot number of docs over all queries
Ret 49884 5OOOO 49876 5OOOO
Rel 3929 3929 3929 3929
ReIRet 2983 3108 3274 3401
Recall (interp) Precision Averages
0.00 0.7013 0.7201 0.7528 0.8063
0.10 0.4874 0.5239 0.5567 0.6198
0.20 0.4326 0.4751 0.4721 0.5566
0.30 0.3531 0.4122 0.4060 0.4786
0.40 0.3076 0.3541 0.3617 0.4257
0.50 0.2637 0.3126 0.3135 0.3828
0.60 0.2175 0.2752 0.2703 0.3380
0.70 0.1617 0.2142 0.2231 0.2817
0.80 0.1176 0.1605 0.1667 0.2164
0.90 0.0684 0.1014 0.0915 0.1471
1.00 0.0102 0.0194 0.0154 0.0474
Average precision over all rel does
Avg 0.2649 0.3070 0.3111 0.3759
Precision at
5 does 0.4920 0.5200 0.5360 0.6040
10 does 0.4420 0.4900 0A880 0.5580
15 does 0.4240 0.4653 0A693 0.5253
20 does 0.4050 0.4420 0.4390 0.4980
30 does 0.3640 0.3993 0.4067 0.4607
100 does 0.2720 0.2914 0.3094 0.3346
200 docs 0.1886 0.2064 0.2139 0.2325
S00docs 0.1026 0.1103 0.1137 0.1229
1000 does 0.0597 0.0622 0.0655 0.0680
R-Precision (after Rel)
Exact 0.3003 0.3332 0.3320 0.3950
Table 4. Automatic ad-hoe run statistics for queries 101-150 against
WSJ database: (1) nyuirl - TREC-2 official run with <desc> and <narr>
fields only; (2) ?[OCRerr]uir1a - revised term weighting run; (3) nyuir2 -
official TREC-2 run with <desc>, <con>, and <fac> fields only; and (4)
nyuir2a - revised weighting run.
such approaches are effective, they also tend to be costly
because of increased index size and more complicated