SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) UCLA-Okapi at TREC-2: Query Expansion Experiments chapter E. Efthimiadis P. Biron National Institute of Standards and Technology D. K. Harman A limitation of the UCLA version of OKAPI is that it does not allow modifications of the basic retrieval functions (i.e., the BMs or best match functions). 4 Results and Discussion The results of the Routing runs, the Ad hoc runs and the Ad hoc additional runs are given in Table 1, Table 2 and Table 3 respectively. Routing runs The 35 Routing runs given in Table 1 are presented in descending recall values. The runs bml5 ph Eynb] . qen. uclagsl [OCRerr]yn], i.e., the runs without query expansion, were used as base- line runs in order to facilitate comparisons. All other runs reported in the table include query expansion. The results indicate that runs with query expansion, where the r[OCRerr]lohi or the r[OCRerr]hilo algorithm was used performed better than all other runs in terms of Recall, Average Pre- cision, and R-Precision. Ad hoc runs >From the three official Ad hoc runs, uclaal, was the au- tomatic run that did not include query expansion and has been used as a baseline-run, uclaa2, was an automatic run that included query expansion without any relevance infor- mation, and uclaf 1, was a run with user supplied relevance feedback and query expansion. In terms of R-Precision and Average Precision the run with feedback and query expansion (uclaf 1) did better than the automatic run with query expansion (uclaa2), but the baseline was slightly better. Ad hoc additional runs The results of the Ad hoc additional runs are given in Ta- ble 3. The official run with feedback (uclafi) using wpq for the expansion is compared to the runs which used the r[OCRerr]lohi, r[OCRerr]hilo, emim and porter algorithms respectively for the expansion. The results indicate that r[OCRerr]lohi and r[OCRerr]Ailo have performed better than the other algorithms. These results further corroborate the results obtained from the routing runs. In order to further validate the results the sign test as well as the t-test were performed on the data. The results from the sign test are given on Tables 4-15. The tables are arranged in sequence starting from Precision at 15, 30, and 100 documents, Average Precision, Recall-Precision, to Recall. In each case, two tables are given; the first ta- ble gives the differences and the second the probabilities. As it can be expected there are no differences at Precision at 5 documents and at Precision at 10 documents because these were the same for all five runs. For this reason the corresponding pairs of tables have not been included in the paper. The results also show no significant differences at Precision at 15 documents and at 30 documents. Signifi- cant results appear at Precision at 100 documents where r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > emim [OCRerr] wpq [OCRerr] porter. The sign test results on Average Precision demonstrate that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > wpq [OCRerr] emim [OCRerr] porter, where emim > porter. The results on Recall show some group- ing between the algorithms, so that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > emim wpq > porter. The results from the Recall- Precision indicate that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo [OCRerr] emim > wpq [OCRerr] porter with r[OCRerr]lohi> emim but not significantly better and with wpq slightly better than porter. >From the study of the sign test results certain overall comments emerge about the performance of the five alg[OCRerr] rithms. The results seem to be consistent throughout with r[OCRerr]lohi performing better than the other algorithms. Dif- ferences between emim, wpq and porter are not consistent but it seems that emim is slightly better than wpq which is better than porter. To further strengthen the validity of the results the t- test was per formed on the data. The t-test results are given on Tables 16-21. The tables are arranged in sequence from Precision at 15, 30 and 100 documents, Average Pre- cision, Recall-Precision, to Recall. Each table gives the Mean difference, the standard deviation difference, the t- statistic and the probability. As in the case with the sign test there were no differences for Precision at 5 documents and Precision at 10 documents and therefore the corre- sponding tables have not been included in the paper. Sim- ilarly, there are no significant differences at Precision at 15 documents and Precision at 30 documents. The results at Precision at 100 documents show that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > emim [OCRerr] wpq [OCRerr] porter, this result is the same as the sign test. The results from Average Precision demonstrate that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo > emim [OCRerr] wpq [OCRerr] porter, with emim better than porter. For Recall the results are that r[OCRerr]lohi r[OCRerr]hilo > emim [OCRerr] wpq > porter. Finally, the Recall-Precision results demonstrate that r[OCRerr]lohi [OCRerr] r[OCRerr]hilo [OCRerr] emim > wpq > porter, where r[OCRerr]hilo is bet- ter than em'm. The results of the t-tests are consistent for the algorithms 284