SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Retrieval of Partial Documents
chapter
A. Moffat
R. Sacks-Davis
R. Wilkinson
J. Zobel
National Institute of Standards and Technology
D. K. Harman
the face of non-exhaustive relevance judgements.
When precision rates are around 30%, and a fur-
ther (in the L = 1,000 case) 30% of documents
are unjudged, there can be no significance what-
soever attached to the difference between even
30% precision and 40% precision. Indeed, as-
suming that 27.1% of the unjudged documents
are relevant for the "L = 1,000; Doc[OCRerr] combina-
tion gives a final precision of 0.411; the corre-
sponding number for the "All; Doc" pairing is
only 0.373. Thus, the precision figures of Table 1
are sufficiently imprecise that no conclusion can
be drawn about the appropriate value of L that
should be used, and about the merits of docu-
ment vs. paged retrieval. There is clearly scope
for research into other methodologies for compar-
ing retrieval mechanisms.
3 Structured documents
Many of the documents in the TREC collection
are very large and have explicit structure, and
it may be possible to use this structure-rather
than the statistically based pagination methods
described abov[OCRerr]to break documents into parts.
In particular, many documents can be broken up
into a set of sections, each section having a type.
There has been relatively little work done on re-
trieving or ranking partial documents. However,
Salton et al. [12] have demonstrated that docu-
ment structure can be valuable. Sometimes this
structure is explicitly available [2], and sometimes
it has to be discovered [5], but the knowledge
of this structure has been shown to help deter-
mine the relevance of sub-documents. In this
part of the work we used a small database to
investigate whether retrieval of sections helped
document retrieval, and whether retrieval of doc-
uments helped section retrieval. By way of a
benchmark, the paged retrieval techniques de-
scribed earlier were applied to the same database.
3.1 The database
Since we needed information about the relevance
of sections to queries it was not possible to use the
full TREC database. Instead, we used a database
consisting of 4,000 documents extracted from the
Federal Register collection. These documents
were selected as being the 2,000 largest docu-
ments which were relevant to at least one of topics
51-100 provided for the first TREC experiment.
Another 2,000 documents were randomly selected
from the Federal Register collection to provide
both smaller documents and non-relevant docu-
ments. The average number of words in these
documents was 3,260.
These documents were then split into sections
based on their internal markup. The documents
had a number of tags inserted that defined an in-
ternal structure. It appeared that only the T2
and T3 tags could be reliably used to indicate a
new internal fragment. Section breaks were de-
fined to be a blank line, or a line containing only
markup, followed by a T2 or a T3 tag. This led
to a database of 32,737 sections. Each of these
sections had a type based on its tag. The types
were (purpose), (abstract), (start), (summary),
(title), (supplementary), and a general category
(misc) that included all remaining categories.
Having made the document selections, only 19
of the queries 51-100 had a relevant document
in the collection. Each of the sections for doc-
uments that had been judged as relevant was
judged for relevance against these queries so that
finer grained retrieval experiments were possi-
ble. One difficulty that arose was that quite a
few documents that had been judged relevant ap-
peared to have no relevant sections-there were
relevant key terms in the documents but the doc-
uments themselves did not appear to address the
information requirement. There were 145 such
(query, document) pairs. To be consistent, we
took these document to be irrelevant. After these
alterations, only 14 queries had a relevant section,
and there were an average of 23 relevant sections
per query.
3.2 Structured ranking
[OCRerr]`e carried out a set of experiments on rank-
ing documents using the retrieval of sections.
We first compared simple ranking of documents
against ranking sections to find relevant docu-
ments. Next, a set of formulae were devised that
attempted to use the fact that one document has
several sections that might be more or less highly
ranked. These took into consideration the rank
of the section, the number of ranked sections, and
the number of sections in the document. Exper-
iment 3 describes one of the more successful for-
mulas.
Further trials were then performed using the
type of the section. First, a set of experiments
were run that determined which types were bet-
ter predictors of relevance. These results were
then used to devise a measure that used a weight
for each type. Finally, we tried to combine these
188