SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) A Boolean Approximation Method for Query Construction and Topic Assignment in TREC chapter P. Jacobs G. Krupka L. Rau National Institute of Standards and Technology Donna K. Harman 2134 TERN SANCTION 2135 TERN SANCTIONS 2136 TERN DISINVESTNENT 2138 TERN SULLIVAN 2139 TERN PRINCIPLES 2141 TERN PUNITIVE 2144 TERN BUTHELEZI 2145 TERN PRETORIA 2146 TERN ANTI-APARTHEID 2147 TERN APARTHEID 2149 TERN DE 2150 TERN KLERK 2152 TERN SOUTH 2153 TERN AFRICA 2137 OR 2134 2135 2136 2140 AND 2138 2139 2142 AND 2141 2029 2143 OR 2137 2140 2142 2148 OR 2144 2145 2146 2147 2151 AND 2149 2150 2154 OR 2153 52 2155 AND 2152 2154 2156 OR 2148 2151 2155 2157 AND 2143 2156 2158 AND 2156 2143 T0P1C052 OR 2157 2158 Each line in the above data gives a unique number (or topic designator) to the test, a test identifier (either TERM for a simple word test, OR, or AND), and a list of simple terms or previous tests. For example, test 2137 depends on tests 2134, 2135, and 2136, and is true if any of those tests is true, namely, if the text includes any of the words sanction, san ci ions, or disinv[OCRerr]stment. The tests are automatically ordered so that all tests that are dependent on other tests will have higher numbers than the tests they depend on; thus all TERM tests appear first. In this case, the TERM test AFRICAN appears with a much lower number simply because it is used in many different queries. The pr[OCRerr]-fili[OCRerr]r, which can work either on complete documents or paragraphs, goes through every word in its input and, using a fast table look-up, sets the TERM tests to true for every word it encounters. At the end of input, either the end of the paragraph or end of each document, it runs through the table of possible tests from low numbers to high numbers and sets tests to true if their 303