SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
UCLA-Okapi at TREC-2: Query Expansion Experiments
chapter
E. Efthimiadis
P. Biron
National Institute of Standards and Technology
D. K. Harman
A limitation of the UCLA version of OKAPI is that it
does not allow modifications of the basic retrieval functions
(i.e., the BMs or best match functions).
4 Results and Discussion
The results of the Routing runs, the Ad hoc runs and the
Ad hoc additional runs are given in Table 1, Table 2 and
Table 3 respectively.
Routing runs
The 35 Routing runs given in Table 1 are presented in
descending
recall values. The runs bml5 ph Eynb] . qen. uclagsl [OCRerr]yn],
i.e., the runs without query expansion, were used as base-
line runs in order to facilitate comparisons. All other runs
reported in the table include query expansion.
The results indicate that runs with query expansion,
where the r[OCRerr]lohi or the r[OCRerr]hilo algorithm was used performed
better than all other runs in terms of Recall, Average Pre-
cision, and R-Precision.
Ad hoc runs
>From the three official Ad hoc runs, uclaal, was the au-
tomatic run that did not include query expansion and has
been used as a baseline-run, uclaa2, was an automatic run
that included query expansion without any relevance infor-
mation, and uclaf 1, was a run with user supplied relevance
feedback and query expansion.
In terms of R-Precision and Average Precision the run
with feedback and query expansion (uclaf 1) did better
than the automatic run with query expansion (uclaa2),
but the baseline was slightly better.
Ad hoc additional runs
The results of the Ad hoc additional runs are given in Ta-
ble 3. The official run with feedback (uclafi) using wpq
for the expansion is compared to the runs which used the
r[OCRerr]lohi, r[OCRerr]hilo, emim and porter algorithms respectively for
the expansion. The results indicate that r[OCRerr]lohi and r[OCRerr]Ailo
have performed better than the other algorithms. These
results further corroborate the results obtained from the
routing runs.
In order to further validate the results the sign test as
well as the t-test were performed on the data. The results
from the sign test are given on Tables 4-15. The tables
are arranged in sequence starting from Precision at 15, 30,
and 100 documents, Average Precision, Recall-Precision,
to Recall. In each case, two tables are given; the first ta-
ble gives the differences and the second the probabilities.
As it can be expected there are no differences at Precision
at 5 documents and at Precision at 10 documents because
these were the same for all five runs. For this reason the
corresponding pairs of tables have not been included in the
paper. The results also show no significant differences at
Precision at 15 documents and at 30 documents. Signifi-
cant results appear at Precision at 100 documents where
r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > emim [OCRerr] wpq [OCRerr] porter.
The sign test results on Average Precision demonstrate
that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > wpq [OCRerr] emim [OCRerr] porter, where
emim > porter. The results on Recall show some group-
ing between the algorithms, so that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo >
emim wpq > porter. The results from the Recall-
Precision indicate that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo [OCRerr] emim > wpq [OCRerr]
porter with r[OCRerr]lohi> emim but not significantly better and
with wpq slightly better than porter.
>From the study of the sign test results certain overall
comments emerge about the performance of the five alg[OCRerr]
rithms. The results seem to be consistent throughout with
r[OCRerr]lohi performing better than the other algorithms. Dif-
ferences between emim, wpq and porter are not consistent
but it seems that emim is slightly better than wpq which is
better than porter.
To further strengthen the validity of the results the t-
test was per formed on the data. The t-test results are
given on Tables 16-21. The tables are arranged in sequence
from Precision at 15, 30 and 100 documents, Average Pre-
cision, Recall-Precision, to Recall. Each table gives the
Mean difference, the standard deviation difference, the t-
statistic and the probability. As in the case with the sign
test there were no differences for Precision at 5 documents
and Precision at 10 documents and therefore the corre-
sponding tables have not been included in the paper. Sim-
ilarly, there are no significant differences at Precision at 15
documents and Precision at 30 documents. The results at
Precision at 100 documents show that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo >
emim [OCRerr] wpq [OCRerr] porter, this result is the same as the
sign test. The results from Average Precision demonstrate
that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > emim [OCRerr] wpq [OCRerr] porter, with
emim better than porter. For Recall the results are that
r[OCRerr]lohi r[OCRerr]hilo > emim [OCRerr] wpq > porter. Finally,
the Recall-Precision results demonstrate that r[OCRerr]lohi [OCRerr]
r[OCRerr]hilo [OCRerr] emim > wpq > porter, where r[OCRerr]hilo is bet-
ter than em'm.
The results of the t-tests are consistent for the algorithms
284