SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
Will eventually output vectors in the appropriate datahase format, and this
entire step can l)e omitted.
4. SVD calculations usually run on -5(),()(M) docs x nterm%' matrices. The
remaining docs (if any) were indexed and added to the datal)ase here.
C. Data built from sources other th[OCRerr]ui tlie input text --no
II. Query c()i'structioil
(please till out a sectioll f()r each query consti-uction method used)
A. Automatic[OCRerr]illy built quefles (ad hoc) yes
5u1)Initted two sets of ad hoc (1ueries; (1ueries were the same in hoth c[OCRerr]%'es; only difference
was how information from diflerent sul)-c()llecti()ns was coml)ined
1. topic tields used
all (except NO manually indexed terms used)
2. to[OCRerr]l computer tilne to build query (cpu seconds)
Queries are vect([OCRerr]r sums ([OCRerr]f constituent term vectors
Separate query vector created fi[OCRerr]r matching against each of 9 datal)ases (DOE,
WSJI, API, FRi, ZIFFI, WSJ2, AP2, FR2, ZIFF2)
Time = .4 secI(1ueryldatal)ase -> 3.6 secs/([OCRerr]uery
NOTE: These times simulate handling each query separately (so there is no ilo
l)utfering). There are l)ig improvements if you initially read in all the term vectors
and create all the ad hoc queries at once.
3. which of the following were used?
a. term weighting wi[OCRerr] weights b[OCRerr][OCRerr]ed on teims in topics
term weighting, but weights based on term usage in document c([OCRerr]llections
Ii. expalision of quenes usin([OCRerr] previ()usly-constructed dr[OCRerr]ta structure (from p£irt I)
not really
D. Automatic[OCRerr]-dly built queries (routin{',) yes
submitted two sets of routing queries. Both were automatically created from
I) the text of the topics and
2) the relevant documents
1. topic tields used
all (except NO manually indexed terms) for 1)0th 1) and 2)
2. total computer tilne to build qucly (cpu seconds)
Queries are vector sum[OCRerr]. of constituent term vectors [case I)] ()[OCRerr] document vectors
(case 2)].
Separate query vector created for matching against each of 4
(WSJ1, APi, FRI, ZIFFI)
Time = .4 seclqueryldatal)ase in case I) -> 1.6 secs/query
Time = .1 sec/query/database in case 2)-> ().4 secsI([OCRerr]uery
NOTE: These times simulate handling each query separately
buffering).
separate databases
3. which of the ft)ll()win[OCRerr] were used in buildin[OCRerr] [OCRerr]e query?
a. terms selected from
(1) topic case I)
(3) only documents with relev[OCRerr][OCRerr]ice judgments case 2)
b. telin weighting
(2) with wei{',hts b-[OCRerr]sed oil terms in all training d()c[OCRerr]ents
(so there is no i/o
III. Searching
469