SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) A Boolean Approximation Method for Query Construction and Topic Assignment in TREC chapter P. Jacobs G. Krupka L. Rau National Institute of Standards and Technology Donna K. Harman complex and sophisticated methods, so long as enough knowledge is used in constructing the Boolean expressions. AD HOC TEST 11-pt. Rel.@ ava. lOOdocs. Rel./ Retrieved ROUTING TEST 11-pt. Rel.@ ava. lOOdocs. Rel.I Retrieved Boolean pre-lilter .2029 47.2 .46 .2078 35.6 .34 Pattern matcher .1961 46.2 .46 .1851 34.6 .37 Median for all runs .1585 39.7 .1246 28.6 top above below top above below score median median equal score median median equal Boolean pre-tilter 5 28 15 2 5 31 6 7 Pattern matcher 5 26 17 2 6 27 10 6 Figure 1: GE TREC test results 5 Limitations and Future Work While the overall, relative results were generally strong, the system as it was implemented had some basic flaws that should be easily corrected. One char- acteristic of this method is that it seems to produce excellent results on the queries that produce large volumes of data, and tends to produce almost noth- ing on some of the narrowest queries. It also produces both high precision and high recall for the texts that match, but makes it difficult to "loosen up" to let in more texts. Since the match relies on at least some exact match between query terms and texts, there are some queries (for example, about the details of rewriteable disks), that produced no hits. By contrast, in one configuration, the system assigned over 4,700 texts to one query ("sightings of 1988 presidential candidates"), of which only 200 were included in the submitted results. This might be desirable behavior for a routing system, but in the TREC style of evaluation the system did badly on those queries where it failed to produce at least 200 documents. On the other hand, the system produced at least 6000 responses in even its strictest configuration, or at least 120 texts, on average, per query. 305