SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman d. brief descriptioll of Illetliods used count word and n(Jn-w([OCRerr]rd fre(Iuencies using splay tree other data structures built from [OCRerr]`REC text (what?) a single file (jf the text itselt; compressed a. loLil [OCRerr]ufl()uflt of sI()r[OCRerr]I[OCRerr]e (incgabytes) 253.2 MI) b. to[OCRerr]l computer time to build (approxilnate number of hours) 3.10 cpu hours C. is the process completely automatic? yes d. brief description of methods used zero-order w()rd-l)ased model using Huffman c(KIing other data structures built from TREC text (what?) a file of document addresses and document lengths (f[OCRerr])r cosine) a. total [OCRerr]lin()uflt of st()r[OCRerr]i[OCRerr]e (megabytes) 1.8 Ml) b. total computer Ijine to build (approxilnate iiumber of hours) negligil)le other data structures built from TREC text (what?) vocal)ulary for inverted index a. total [OCRerr]unount of stor[OCRerr][OCRerr][OCRerr]e (megabytes) 3.6 Ml) b. toL[OCRerr]l computer tilne to build (approxilnate number of hours) 2.41 cpu hours C. is the process completely automatic? yes d. brief description of Ineth(xls used count stemmed w[OCRerr][OCRerr]rd fre(luencies using splay tree other da[OCRerr] structures built from TREC text (what?) a file of inverted index entry addr[OCRerr]sses a. toLd [OCRerr]unount of st()r£1[OCRerr]e (me(Tabytes) 1.2 Ml) b. total computer tilne to build (approxilnate number of hours) negligil)le other data structures built from TREC text (what?) a file of approximate document lengths a. total (unount of storaLTe (megabytes) 0.2 Ml) b. total computer tilne (0 build (approxilnate number of hours) negligihle C. Data built from sources other th(w the iuput text --no II. Query construction (please fill out a section for each query construction method used) A. Automatically built queries (ad hoc) 1. topic fields used all 2. toL[OCRerr] computer tilne to build query (cpu seconds) less than one second 3. which of the ft)llowin([OCRerr] were used? a. tenn weightin[OCRerr] witli weights k[OCRerr]ed on tenns in topics yes, as in cosine measure j. other (describe) used stop words to eliminate comnion words from query eliminated SGML tags and all punctuation III. Searching A. Total computer tilne to scaich (cpu seconds) I & 2 were not timed separately; 35 seconds per query to identify the top 2(10 ranked items further 4.6 seconds of cpu decompress the top 200 items, 18.6 seconds in total including retrieval time 1. retrieval time (total cpu seconds between when a query enters the system until a list of document numbers [OCRerr] obt'[OCRerr]ined) 2. rankin[OCRerr] time (t()tal cpu seconds to sort d([OCRerr]ument list) 488