NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman b. t()[OCRerr]l number of c()1lcepts represented 25(),()(N[OCRerr]+ c([OCRerr]ncepts, 1.5M links C. type of representLIti()Il (fr[OCRerr]unes, semaxitic Ilets, rules, etc.) Weighted semantic network d. total computer time to build (appr()x[OCRerr]ate iiumber of hours) ([OCRerr], already had it e. total muiu[OCRerr]'d time to build (approximate uuinber of hours) ([OCRerr] f. use of manual latx)r (2) mostly m('lchine built with manu£'il correction yes--I)ut prior to TREC, hot DB specitic g. auxili[OCRerr]iry tiles needed for machine use (1) `n£ichine-readable diction[OCRerr]'iry (which one?) [OCRerr]erriaIn Wel)ster (al)ridged) (2) other (identify) Word Net, plus several thesaurus tiles C. Data built from 5OurCC5 other th'ui [OCRerr]e iliput text See 3(g) al)ove 1. inteni[OCRerr]illy-built auxili[OCRerr]'iry files Semantic Netw(Jrk a. do'n[OCRerr]un independent or domain specific (if two sep[OCRerr]irate files, please fill out one set of questions for each file) b. type of file (thesaurus, knowledge b£[OCRerr][OCRerr]e, lexicon, etc.) All in one C. total £lln()unt of stora[OCRerr]e (Ine[OCRerr]Tabytes) 12 d. total number of concepts represented 25([OCRerr],()([OCRerr]([OCRerr]+ C. type of represenL[OCRerr]ti()n (fi-unes, semantic nets, rules, etc.) Semantic net f. t()t£'Ll computer tilne to build (approxilnate number of hours) Already had (1) if £`itready built, how much time to modify for TREC? None g. total m'[OCRerr]u[OCRerr]tl time to build (approximate number of hours) Already had (1) if ah-e£'Ldy built, how much tune to modify for TREC.! None h. use of manual labor (2) mostly machine built with manu[OCRerr]'il correction II. Query constructioll (please till out a section for each query c()nstl[OCRerr]ucti()n method used) A. Autoinatic£illy built queries (ad hoc) 1. topic fields used Used entire topic with s(jme simple tiltering 2. total computer tilne to build query (cpu seconds) unknown, est. < ([OCRerr].1 sec. ea. 3. which of tlie following were used? a. term weighting with weights bL'L[OCRerr]Cd on teflns in topics b. phrase extraction from topics C. syntactic pusing of topics d. word sense disLnnbiguL[OCRerr]i()n C. proper IIOUII identific[OCRerr]ti()n algorithm (look up) f. tokenizer (reco(2nizes [OCRerr]ites, phone numbers, coliuflon pattenis) (1) which pattems (`LrC tokenized? many h. expailsion of queries Usin(T previously-c()nstructed [OCRerr]ita structure (from part I) (1) which structure? Tapered wind([OCRerr]w B. Manually constructed queries (ad h(ic) 1. topic fields used User judgment 2. aver[OCRerr]ige tune to build query (minutes) 1-5 minutes 3. type of query builder b. computer system expert 4. tools used to build query a. word frequency list yes b. knowledge base browser (knowledge base described in part I) yes (1) which structure from pail I c. other lexical tools (identify) Lexicon 5. which of the following were used? 503