SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman 11. proper noun identific[OCRerr][OCRerr]ti()n (dg()n[OCRerr]rn n[OCRerr][OCRerr]ne 12. tokenizer (rec()[OCRerr]nize.[OCRerr] d[OCRerr]te[OCRerr], phone nuiuber.[OCRerr], coifliflon patteni.';) Il()[OCRerr]C 13. aic the rn[OCRerr][OCRerr]u[OCRerr][OCRerr]ly-iiidexed tenn[OCRerr] used? none 14. other techniques used to build [OCRerr]ita structwes (brief description) flofle B. Statistics on d[OCRerr][OCRerr]ita structures built from Tl[OCRerr]EC text (please fill out each applicable secti()n) 1. inverted index Based only on pairs, not individual ternis. a. total ainount of storage (megabytes) 819 [OCRerr]egahytes b. total computer tilne to build (approxu nate number of hours) 1(11) hours C. is the pr('cess completely automatic? yes d. £ire terin positions witlim d(icuments stored? no e. single terms only? flolie 2. n-grains, suffix ([OCRerr]ays, signature files See BI. C. Data built from sonices other th[OCRerr] [OCRerr]e input text --no II. Query construction (please fill out £,i section flir each query c()nsti[OCRerr]ucti()n method used) A. Automatically built queries (ad hoc) 1. topic fields used Title, Description, Narrative, and Concepts (only tirst two.) 2. to[OCRerr]l computer time to build query (cpu seconds) (1.26 SeCond. 3. which of the following were used? fiolle D. Automatically built queries (r()utin([OCRerr]) 1. topic fields used Title, Description, Narrative, Concepts (first two). 2. total computer tune to build query (cpu seconds) 55 SeConds 3. which of thc following were used in building [OCRerr]e query? c. phrase extrLlcti()n (2) from [OCRerr][OCRerr]Lll trunilig documents Word pairs occurring in the relevant training documents for the query 1)ut not in the irrelevant documents were used. III. Searching A. Total computer tilne to se(u[OCRerr]di (cpu seconds) 1. retiiev[OCRerr]tl tilne (total cpu seconds between when a query enters tlie system until a list of document numbers are ()bt[OCRerr]1ined) This was not optimized f[OCRerr])r the current experiments. Run time was approximately 2(1 minutes per search. 1[OCRerr]r()per ()ptiniizati([OCRerr]n will reduce this tinie. 2. r[OCRerr][OCRerr]mkin[OCRerr] time (total cpu seconds to sort d('cument list) .22 seconds B. Which metliods best describe your michine searching metliods? 4. n-gr£[OCRerr] matching C. What f[OCRerr][OCRerr]ct()rs aic included in y[OCRerr][OCRerr]ur r[OCRerr][OCRerr]tnking? 11. li-grun fiequency IV. What m£[OCRerr]hine did you conduct [OCRerr]e TRF£C experiment on? IB[OCRerr]1 3(19()/3()(lj How much RAM did it h(tve? 16 Meg f([OCRerr]r a virtual machine. Wh[OCRerr][OCRerr]it w[OCRerr]Ls the clock rate (if [OCRerr]e Cl[OCRerr][OCRerr] 1? 14.5 nanoseconds, or 69 MH7. V. Some systems aic research prototypes (md others [OCRerr]ue c()Inmerci£(1. To help c(imp£ue [OCRerr]ese systems: 465