NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman C. Data built from sources other thㄆi [OCRerr]e input text --no II. Query construction (please fill out εt section for cich query construction method used) A. Automatically built queries ([OCRerr]id hoc) I. topic fields used Description, Narrative, and Concepts. 2. total computer tilne to build query (cpu seconds) Vector queries--5([OCRerr] Seconds for [OCRerr] topics 3. which of the following were used? a. term weighting wi[OCRerr] weights b[OCRerr]L[OCRerr]ed on te[OCRerr]s in topics Term weighting was used for vector queries. C. proper noun identific[OCRerr]ition algori[OCRerr]m As provided in SMART f. tokenizer (recognizes dΩOCRerr]tes, phone numbers, coininon patterils) As provided in SMART B. Manually constructed queries (`id hoc) 1. topic fields used Description, Narrative, and Concepts. 2. average time to build query (minutes) 3 mills/query 3. type of query builder b. computer system expert 4. tools used to build query b. knowledge base browser (knowledge base described in p(u[OCRerr]t I) (1) which structure from p[OCRerr]ut I for solliC of our work we build a knowledge base to help suggest broader/narrower terms--added inf[OCRerr])rInati()n can be provided if appropriate c. other lexical t()()ls (ideutify) vi (editor) 5. which of the following were used? b. Boolean connectors (ANI), ()I[OCRerr], N([OCRerr][OCRerr]) d. εtddition of terms not included in topic (1) source of temis domain knowledge of experts III. Searching A. Total computer tilne to se[OCRerr]irch (cpu seconds) Approx. 4 minutes for each topic. We did a full sequential pass through documents for this since we did space fi)r the inverted file. 1. retrievι tilne (total cpu seconds between when a query enters ilie document numbers uc obtained) 2. rankin(2 time (total cpu seconds to sort d('cument list) not have enough disk system until a list of B. Which metliods best describe your m[OCRerr]icliine sen-ching Ine[OCRerr]()ds? Meth[OCRerr][OCRerr]ds: [OCRerr]e used three main methods, and a scheme f(a- c()ml)ining the resulLs from those runs 1. vector space In(xlel 5. B()()lean matching 6. fuzzy logic (include y[OCRerr][OCRerr]ur definition) 1)-norm matching C. What factors [OCRerr]`ire included in y[OCRerr][OCRerr]ur rinking? We used several weighting methods in combination with the methods, to get a total of 8 ru[OCRerr]s that were the basis fi)r our sul)missi([OCRerr]n. We used binary weights, as well as: 1. terin fi-equency 2. inverse d('cument frequency 511