SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Okapi at TREC-2
chapter
S. Robertson
S. Walker
S. Jones
M. Hancock-Beaulieu
M. Gatford
National Institute of Standards and Technology
D. K. Harman
the top 1000 IDs; if less than three the top i00[OCRerr] trom
the set which was finally "current" were output.
There seemed to be an impression that the new top-
ics (topics3) are more difficult than the old. Results
may also have been affected by the huge stoplist which
was being used at that time because of a breakdown
of the only disk large enough to hold the very large
scratch files generated during inversion. Lack of the
number "6" affected one topic, days of the week all-
other ("Black Monday"). The searcher was urged to
leave "Black Monday" to the end in case we were able
to reindex before the deadline, but she decided to try it
and thought it worked quite well.
An edited transcript of one searcher's notes is given
below as Appendix B.
5.3 Results
The official results of the manual run (Table 5) are dis-
appointing, with average precision 0.232 (60% of topics
below median), precision at 100 docs 0.4 and recall 0.59.
The final iteration was later re-run with BM1t instead
of BM15, and the results combined with the feedback
documents from the original searches for a frozen ranks
evaluation5. This did somewhat better on a majority
of the topics, but overall the manual results were very
poor compared to some of the autoniatic runs.
6 Other experiments
6.1 Query modification without
relevance information
Some iterative automatic ad hoc runs were done in
which the top 10-50 documents obtained by the best
existing method were used (a) as a source of additional
terms and (b) as a source of "relevance" information for
the [OCRerr](1) weight calculation.
Expansion terms were selected as described in Section
4.2, in descending order of r [OCRerr] [OCRerr] The maximum
R
number of additional terms was set at half the number
of query terms. For many of the topics most of the top
terms extracted from the feedback documents were in
any case topic terms, so the number of additional terms
was small.
Example (topic 112)
Topic 112: Funding biotechnology
30 feedback documents used
In the table which follows, term sources are given either as
doc, in the case of expansion terms, or as a topic field, where
tit> con> nrlr> desc. In this example, final weights involve
a qtf component, and were obtained using equation 6 with
5There were two topics where the searcher found no relevant
documents, so for these topics the original results were inserted.
28
(tile res[OCRerr]ilting weight was multiplied i[OCRerr]y k3 to obtain
ad[OCRerr](jUate gia[OCRerr]ular[OCRerr]ty hL an integer representation). For ex-
pansion terms., was taken as 1 and the same correction
apph(d.
`Term :`rc qif [OCRerr] docs
`)iotechnoiog] fit 9 30
-oft 4 29
fund tit 2 23
capit r 2 21
ph arrnaceo t doc (0) 15
ventur it ar 1 21
tinanci.. - oar 2 17
startup.. nar 1 11
research n i- r 26
f"n.1nc doc (0[OCRerr] 15
partner doc (0) 17
dreg dcc (0) 18
rovestor dcc (0) 19
[OCRerr],rovid oar 14
1i?[OCRerr] 1 22
lechnologi dcc (0) 23
company... doc (0) 28
academ nar 1 4
corpor oar 2 9
desc 1 18
stock nar 1 20
industri doc (0) 23
develop dcc (0) 25
laborat on nar 1 9
quantifi oar 1 1
profit nar 1 14
enterpr nar 1 4
estabh.sh oar 1 10
arena* tiar 2 0
data oar 4 6
sale nar 1 12
loss nar 1 7
government. liar 1 13
assist nar 1 6
much desc 1 11
answer desc 1 2
follow oar 1 7
rel* desc 1 1
eg* nar 1 0
question desc 1 3
worldwid* nar 2 0
division * oar 1 2
fig[OCRerr][OCRerr]* nar 1 2
Weights
145
80
55
51
73
67
36
62
61
54
55
53
- 52
66 21
36 50
50
- 48
73 48
76 26
37 43
33 43
42
- 42
51 39
82 39
40 38
59 33
38 29
148 15
108 8
30 24
39 22
24 20
39 20
28 20
52 16
26 9
52 9
67 8
37 8
126 4
41 6
41 5
Orig
765
148
78
78
55
64
70
35
6 from the topic. The
here, Iline of the 43 terms are not
starred terms were not used in the final search because
their selection value [OCRerr] [OCRerr] is zero (to the nearest
integer). For this topic, the additional terms were
beneficial and reweighting alone rather neu+ral.
6The terms followed by ellipses represent synonym classe[OCRerr]
Final
614
213
88
81
64
59
57
55
54
48
48
47
46
45
44
44
42
42
41
38
38
37
37
34
34
33
29
25
24
21
21
19
17
17
17
14
8
8
7
7
6
5
4