SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Effective and Efficient Retrieval from Large and Dynamic Document Collections
chapter
D. Knaus
P. Schauble
National Institute of Standards and Technology
D. K. Harman
4 Experiments
In this section, we present the evaluation of the method
described above and we compare it to methods to other
weighting schemes. We focus on the efficiency of modi-
fying documents and on the correlation between the re-
trieval efficiency and the retrieval effectiveness. We will
also see what is the influence of the vocabulary restric-
tion on the retrieval effectiveness and on the retrieval
efficiency. For the final evaluations we concentrated on
the adhoc queries.
Before discussing the results, we define what we mean
by a p[OCRerr]rtitio[OCRerr] a vttm and an ezperimemt. The document
collection has been split up into several p[OCRerr]vtition8, each
consisting of at most 100,000 documents. Thus, the
large collections DOEl (Department of Energy, Disk 1)
and ZIFF3 (Ziff-Davids Publishing, Disk 3) were divided
into three and two partitions respectively. A r[OCRerr]m con-
sists of the evaluation of 50 queries (either the set of
routing queries or the set of ad hoc queries) against all
documents of one partition. For each query, the 1000
top ranked documents have been retrieved. An e[OCRerr]e[OCRerr]-
ime[OCRerr]t consists of several runs and the merging of the
lists of ranked documents for each query. For TREC-2,
the two sets of experiments "Topics 51-100 versus Disk
3" and "Topics 101-150 versus Disks 1 and 2" have been
evaluated.
All efficiency evaluations are based on CPU time
rather than on real time in order to eliminate side effects
froni other jobs running on the same machine. In these
experiments, we used a SUN SPARCserver MP690 with
128 MBytes RAM.
We derived the document descriptions directly from
the CD's. The indexing process included the elimination
of stop words (van Rusbergen's stop list [12, pp.18]) and
Porter's word reduction algorithm [6]. The normalized
inverse document frequencies have been derived from
the documents of disks 1 and 2 only. Uncompressing
and indexing a single document needs around 100 msec
on an average depending on the length of the document.
The computation of the inverse document frequencies
from the descriptions took about 1.5 hours of CPU time.
The average time for inserting a document description
into the access structure is on a scale of 10 msec - again
depending on the number of features [OCRerr] per document.
Inserting a document description into an inverted file
would need more time because the postings had to be
inserted into the different lists associated to each fea-
ture.
The restriction of the vocabulary was accomplished
by omitting features occurring in more than 15% of all
documents (from the disks 1 and 2), i.e. in more than
111'337 documents. We have chosen 15% of the collec-
tion although also a stronger limit of 10% should not
affect the retrieval effectiveness [1]. In our experiments
we compare the 15% limit ("dflS") to a non restricted
166
vocabulary ("all").
We now have three parameters which can be com-
bined to specify eight different retrieval methods. Each
method can be identified by a string built from the labels
for the document feature weighting, the query feature
weighting and the vocabulary restriction:
[OCRerr]doc[OCRerr]feat[OCRerr]weight[OCRerr]. (quer[OCRerr]feat[OCRerr]weight) . (vocab)
In what follows, we present the results of the following
nine methods:
MO ntc.ntn.all
Ml lnc.ltn.all
M2 lnc.ltn.dflS
M3 lnc.ntn.all
M4 lnc.ntn.dflS
MS ltc.ltn.all
M6 ltc.ltn.dflS
M7 ltc.ntn.all
M8 ltc.ntn.dflS
First, we compare the retrieval effectiveness of our
method (Ml) described in Section 2 to the standard
tf * idf method (MO) by means of the precision-recall
graph in Figure 2. As expected, the method Ml is more
effective than MO and achieves a retrieval effectiveness
among the best methods presented at TREC-2. In order
to find out the reason for this difference in the retrieval
effectiveness we must have a closer look at the influ-
ences of each parameter (document and query feature
weighting).
In Figure 3' the 1 1-pts average precisions of each
method (MO to M8) are plotted on the left axis, and
they are connected to the median response times (for
the top ranked document) plotted on the right axis. The
most obvious conclusion from this graph is the follow-
ing: the higher the precision, the slower the response,
and vice versa. The method MO performs clearly worse
than the methods Ml to M8 in respect to both retrieval
effectiveness and retrieval efficiency.
We concentrate on the response times of the top
ranked document because the response times of all fur-
ther ranked documents are of secondary interest, since
a user is supposed to read the top ranked document be-
fore looking at the other documents and the retrieval
system can retrieve further documents while the user is
reading the top ranked document.
We can also see from Figure 3, what are the imfl[OCRerr]-
ences of the different p[OCRerr]rameters on the [OCRerr]ver[OCRerr]ge prec?-
sion. Regarding the weighting of the document features,
the "inc" weighting achieves a 4-10% higher precision
than the "ltc" weighting. The "ntc" looses 5% of pre-
cision compared to "ltc". In the case of query feature
weighting, again the logarithmic "ltn" weighting is more