SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
C. Data built from sources other th£ui [OCRerr]e input text --no
II. Query construction
(please fill out £`t section for cich query construction method used)
A. Automatically built queries ([OCRerr]id hoc)
I. topic fields used Description, Narrative, and Concepts.
2. total computer tilne to build query (cpu seconds) Vector queries--5([OCRerr] Seconds for [OCRerr] topics
3. which of the following were used?
a. term weighting wi[OCRerr] weights b[OCRerr]L[OCRerr]ed on te[OCRerr]s in topics
Term weighting was used for vector queries.
C. proper noun identific[OCRerr]ition algori[OCRerr]m As provided in SMART
f. tokenizer (recognizes d£[OCRerr]tes, phone numbers, coininon patterils)
As provided in SMART
B. Manually constructed queries (`id hoc)
1. topic fields used Description, Narrative, and Concepts.
2. average time to build query (minutes) 3 mills/query
3. type of query builder
b. computer system expert
4. tools used to build query
b. knowledge base browser (knowledge base described in p(u[OCRerr]t I)
(1) which structure from p[OCRerr]ut I
for solliC of our work we build a knowledge base to help suggest
broader/narrower terms--added inf[OCRerr])rInati()n can be provided if
appropriate
c. other lexical t()()ls (ideutify) vi (editor)
5. which of the following were used?
b. Boolean connectors (ANI), ()I[OCRerr], N([OCRerr][OCRerr])
d. £`tddition of terms not included in topic
(1) source of temis domain knowledge of experts
III. Searching
A. Total computer tilne to se[OCRerr]irch (cpu seconds)
Approx. 4 minutes for each topic.
We did a full sequential pass through documents for this since we did
space fi)r the inverted file.
1. retriev£d tilne (total cpu seconds between when a query enters ilie
document numbers uc obtained)
2. rankin(2 time (total cpu seconds to sort d('cument list)
not have enough disk
system until a list of
B. Which metliods best describe your m[OCRerr]icliine sen-ching Ine[OCRerr]()ds?
Meth[OCRerr][OCRerr]ds: [OCRerr]e used three main methods, and a scheme f(a- c()ml)ining the resulLs from those
runs
1. vector space In(xlel
5. B()()lean matching
6. fuzzy logic (include y[OCRerr][OCRerr]ur definition) 1)-norm matching
C. What factors [OCRerr]`ire included in y[OCRerr][OCRerr]ur rinking?
We used several weighting methods in combination with the methods, to get a total of 8 ru[OCRerr]s
that were the basis fi)r our sul)missi([OCRerr]n. We used binary weights, as well as:
1. terin fi-equency
2. inverse d('cument frequency
511