SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
A Boolean Approximation Method for Query Construction and Topic Assignment in TREC
chapter
P. Jacobs
G. Krupka
L. Rau
National Institute of Standards and Technology
Donna K. Harman
complex and sophisticated methods, so long as enough knowledge is used in
constructing the Boolean expressions.
AD HOC TEST
11-pt. Rel.@
ava. lOOdocs.
Rel./
Retrieved
ROUTING TEST
11-pt. Rel.@
ava. lOOdocs.
Rel.I
Retrieved
Boolean pre-lilter .2029 47.2 .46 .2078 35.6 .34
Pattern matcher .1961 46.2 .46 .1851 34.6 .37
Median for all runs .1585 39.7 .1246 28.6
top above below top above below
score median median equal score median median equal
Boolean pre-tilter 5 28 15 2 5 31 6 7
Pattern matcher 5 26 17 2 6 27 10 6
Figure 1: GE TREC test results
5 Limitations and Future Work
While the overall, relative results were generally strong, the system as it was
implemented had some basic flaws that should be easily corrected. One char-
acteristic of this method is that it seems to produce excellent results on the
queries that produce large volumes of data, and tends to produce almost noth-
ing on some of the narrowest queries. It also produces both high precision and
high recall for the texts that match, but makes it difficult to "loosen up" to
let in more texts. Since the match relies on at least some exact match between
query terms and texts, there are some queries (for example, about the details of
rewriteable disks), that produced no hits. By contrast, in one configuration, the
system assigned over 4,700 texts to one query ("sightings of 1988 presidential
candidates"), of which only 200 were included in the submitted results. This
might be desirable behavior for a routing system, but in the TREC style of
evaluation the system did badly on those queries where it failed to produce at
least 200 documents. On the other hand, the system produced at least 6000
responses in even its strictest configuration, or at least 120 texts, on average,
per query.
305