NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman Will eventually output vectors in the appropriate datahase format, and this entire step can l)e omitted. 4. SVD calculations usually run on -5(),()(M) docs x nterm%' matrices. The remaining docs (if any) were indexed and added to the datal)ase here. C. Data built from sources other th[OCRerr]ui tlie input text --no II. Query c()i'structioil (please till out a sectioll f()r each query consti-uction method used) A. Automatic[OCRerr]illy built quefles (ad hoc) yes 5u1)Initted two sets of ad hoc (1ueries; (1ueries were the same in hoth c[OCRerr]%'es; only difference was how information from diflerent sul)-c()llecti()ns was coml)ined 1. topic tields used all (except NO manually indexed terms used) 2. to[OCRerr]l computer tilne to build query (cpu seconds) Queries are vect([OCRerr]r sums ([OCRerr]f constituent term vectors Separate query vector created fi[OCRerr]r matching against each of 9 datal)ases (DOE, WSJI, API, FRi, ZIFFI, WSJ2, AP2, FR2, ZIFF2) Time = .4 secI(1ueryldatal)ase -> 3.6 secs/([OCRerr]uery NOTE: These times simulate handling each query separately (so there is no ilo l)utfering). There are l)ig improvements if you initially read in all the term vectors and create all the ad hoc queries at once. 3. which of the following were used? a. term weighting wi[OCRerr] weights b[OCRerr][OCRerr]ed on teims in topics term weighting, but weights based on term usage in document c([OCRerr]llections Ii. expalision of quenes usin([OCRerr] previ()usly-constructed dr[OCRerr]ta structure (from pξrt I) not really D. Automatic[OCRerr]-dly built queries (routin{',) yes submitted two sets of routing queries. Both were automatically created from I) the text of the topics and 2) the relevant documents 1. topic tields used all (except NO manually indexed terms) for 1)0th 1) and 2) 2. total computer tilne to build qucly (cpu seconds) Queries are vector sum[OCRerr]. of constituent term vectors [case I)] ()[OCRerr] document vectors (case 2)]. Separate query vector created for matching against each of 4 (WSJ1, APi, FRI, ZIFFI) Time = .4 seclqueryldatal)ase in case I) -> 1.6 secs/query Time = .1 sec/query/database in case 2)-> ().4 secsI([OCRerr]uery NOTE: These times simulate handling each query separately buffering). separate databases 3. which of the ft)ll()win[OCRerr] were used in buildin[OCRerr] [OCRerr]e query? a. terms selected from (1) topic case I) (3) only documents with relev[OCRerr][OCRerr]ice judgments case 2) b. telin weighting (2) with wei{',hts b-[OCRerr]sed oil terms in all training d()c[OCRerr]ents (so there is no i/o III. Searching 469