SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Automatic Retrieval With Locality Information Using SMART
chapter
C. Buckley
G. Salton
J. Allan
National Institute of Standards and Technology
Donna K. Harman
Tradeoff runs
STANDARD
1. ntc.ntc (single terms) Full 2 pass indexing
2. ntc.ntc (single terms) alternate indexing method making document vectors
STOPWORD
3. ntc.ntc automatic stopword (added 69 terms occurring in 10\X of coil)
4. ntc.ntc automatic stopword (added 350 terms occurring in 5\e/e of coil)
5. ntc.ntc automatic stopword (added 1286 terms occurring in 2\Y, of coil)
STEMMING
6. ntc.ntc only plural stemming
7. ntc.ntc no stems
LOCAL/GLOBAL
local/global (single terms)
local/global (single terms) same thresholds as 2nd official run)
QUERY OPTIMIZATION
query efficiency optimization (15 docs guaranteed good)
PHRASES
11. ntc.ntc phrase dictionary. (> 25 times in Dl, 158,000 out of 4.7 million)
*12. ntc.ntc local/global (phrases)
*8. ntc.ntc
9. ntc.ntc
10. ntc.ntc
13. nnc.ntc
14. lnc.ltc
15. lnc.ltc
Doc
Indexing
Time
(hours)
4.5/4.9
4.7/0.7
4.3/4.6
4.0/4.3
3.7/3.9
4.3/5.0
4.2/4.7
4.7/0.7
liii
ii Ii
1.
2.
3.
4.
5.
6.
7.
*8.
9.
10.
11.
*12
13.
14.
15.
OTHER WEIGHTS
(single terms)
(single terms)
(phrases)
Query
Inverted Other Retrieval
Speed
50 queries
es) ) (seconds)
358
I'll
Indexing File
Time Size
(seconds) (Mbyt
2.3(13.6) 667
2.7
3.2(13.1) 624
3.0(12.8) 528
2.8 381
2.7 724
1.6 752
2.7 667
liii lii
lii' liii liii
7.5/8.0 3.8
9.7/0.9 2.7
4.5 (88.5)
4.5 2.7
8.1
File
Size
(Mbytes
100
790
100
100
100
98
98
790
`III
892 104
892 1040
667 89
liii liii
892 104
**: timing
* indicates official TREC run,
Retrieval-Effectiveness
(averaged over 47 queries)
11-pt NumRel
Total
1813 3114
liii liii
Recall/prec
at 200
2614/3313
ii
306 1828 3101 2587/3299
166 1750 2978 2524/3168
78 1538 2658 2237/2828
251 1745 3148 2605/3349
235 1709 3101 2545/3299
1465 1783 3150 2636/3351
I'll 1982 3400 2856/3617
97 1693 2983 2476/3173
415 1903 3298 2814/3509
2405 2080 3555 3076/3782
262** 1818 3203 2614/3407
2249 3746 3272/3985
396 2424 3886 3394/4134
on machine with 128 Mbyte memory
Query timing numbers in parenthesis indicate CPU time using dictionary on disk
66