SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
A Boolean Approximation Method for Query Construction and Topic Assignment in TREC
chapter
P. Jacobs
G. Krupka
L. Rau
National Institute of Standards and Technology
Donna K. Harman
2134 TERN SANCTION
2135 TERN SANCTIONS
2136 TERN DISINVESTNENT
2138 TERN SULLIVAN
2139 TERN PRINCIPLES
2141 TERN PUNITIVE
2144 TERN BUTHELEZI
2145 TERN PRETORIA
2146 TERN ANTI-APARTHEID
2147 TERN APARTHEID
2149 TERN DE
2150 TERN KLERK
2152 TERN SOUTH
2153 TERN AFRICA
2137 OR 2134 2135 2136
2140 AND 2138 2139
2142 AND 2141 2029
2143 OR 2137 2140 2142
2148 OR 2144 2145 2146 2147
2151 AND 2149 2150
2154 OR 2153 52
2155 AND 2152 2154
2156 OR 2148 2151 2155
2157 AND 2143 2156
2158 AND 2156 2143
T0P1C052 OR 2157 2158
Each line in the above data gives a unique number (or topic designator) to
the test, a test identifier (either TERM for a simple word test, OR, or AND),
and a list of simple terms or previous tests. For example, test 2137 depends on
tests 2134, 2135, and 2136, and is true if any of those tests is true, namely, if
the text includes any of the words sanction, san ci ions, or disinv[OCRerr]stment. The
tests are automatically ordered so that all tests that are dependent on other
tests will have higher numbers than the tests they depend on; thus all TERM
tests appear first. In this case, the TERM test AFRICAN appears with a much
lower number simply because it is used in many different queries.
The pr[OCRerr]-fili[OCRerr]r, which can work either on complete documents or paragraphs,
goes through every word in its input and, using a fast table look-up, sets the
TERM tests to true for every word it encounters. At the end of input, either
the end of the paragraph or end of each document, it runs through the table of
possible tests from low numbers to high numbers and sets tests to true if their
303