NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman System Summary and Timing TRW General Coininenis The fimings should be tlie tilne to replicate runs from scratch, not including tnal runs, etc. Tlie tilnes should also be reasonably accurate. This soinetilflCs will be difticult, such as gettin[OCRerr] total time lor document indexing of huge text sections, or manually building a k'iowledge base. Please do your best. I. Construction of indices, knowledge bases, and other da[OCRerr] structures (please describe all da[OCRerr] structures that your system needs for se[OCRerr]chi'ig) A. Which of [OCRerr]e following were used to build your da[OCRerr]i structures? None--we used a free text scanning approach. CD-ROM data was decompressed and loaded onto niagnetic disk in raw form. B. Statistics on d[OCRerr]ta structures built from TREC text (please fill out each applicable sectk)n) --none C. Data built from sources other th'ui [OCRerr]c input text --none II. Query construction (please fill out a section for each query construction method used) A. Automatically built queries (ad hoc) We performed some initial trials with 1)uilding queries l)ased on word frequency taken from documents from the initial relevance judgments supplied l)y NIST. Unfortunately, this appeared to lead us down a l)lind alley, perhapS l)ecause the initial judgments were not all that good. We are planning to try this again with the new judgments. B. Manually constructed queries (Ł[OCRerr] h([OCRerr]) We primarily used this niethod f[OCRerr])r the TREC queries. 1. topic fields used 2. average t[OCRerr]e to build query (minutes) The initial query would take a couple of nimutes to manually form l)y cutting and pasting from the topic descriptionS with a text editor. "Feedliackt1 was the human looking at the retrieved documents, comparing with the sample good documents supplied l)y NIST, making independent judgments on document relevance, and retining the query in an iterative manner. 3. type of query builder b. computer system expert 4. tools used to build query Ilo special tooLs a. word frequelicy list b. knowledge base browser (knowledge base described in part I) (1) which structure from part I c. other lexical tools (identify) 5. which of the f[OCRerr])llowinLT were used? b. Boolean connectors (AND, OR, NoT) c. proxilnity operators d. addition of terms not included in topic (1) source of terms Additional terms were supplied l)y human hased on outside knowledge or from reading the text. 507