SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman 2.9 hours to reweiglit doc vectors and pr(KIuce inverted tile C. is the pr([OCRerr]ess coiiipletely aut()In£'Itic? yes d. [OCRerr]`ire term positions wi[OCRerr]in d(X'ulnellts stored? no e. single terins only? Ilo 5. other data structures built from TREC text (what?) Map from d([OCRerr]id t([OCRerr] text location (also gives title f([OCRerr]r each dE)c) a. total ainoulit of storuge (niegabytes) 68 Ml)ytes. b. total computer tilne to build (approxu nate number of hours) Time t([OCRerr] create included in inverted tile creation al)()ve. C. is the pr(xess completely aut()Jn£itic? yes other data structures built from TREC text (what?) Map from internal concept to t[OCRerr][OCRerr]ken string a. total [OCRerr]unount of stor£ige (megabytes) 25 Ml)ytes b. total computer tilne to build (approxiznate number of hours) Time to create included in inverted tile creation ahove. C. is the pr([OCRerr]ess completely automatic? yes other data structures built from TREC text (what?) Phrase dictionary (controlled v([OCRerr]al)ulary) Phrases were adjacent n()n-stopw()rds, components stemmed, that occurred at least 25 times in the Dl document set. [OCRerr]i. total unount of stor[OCRerr]ge (me[OCRerr]abytes) 14 Ml)ytes to store dictionary. b. total computer tillie to build (approx[OCRerr][OCRerr]'ite number of hours) It took 5.8 hours to index Dl, finding [OCRerr] phrases and their collection stats. Ot those phrases l58,([OCRerr]()() ([OCRerr]curred at least 25 times. C. is the [OCRerr]r(icC55 completely automatic? C. Data built from source5 other thul the input text None, ([OCRerr]ther than st()pw()rd tile. II. Query construction (please fill out a section for each query construction method used) A. Autx)lnatically built queries (ad hoc) 1. topic fields used Topic, Nationality, Narrative, Concepts, Factors, Description 2. total computer tilne to build query (cpu seconds) 2.7 seconds 3. which of the f[OCRerr])llowing were used? a. term weighting with weights b[OCRerr][OCRerr][OCRerr]ed on terms in topics (idf) b. phrase extraction from topics yes, using controlled list of phra[OCRerr]es III. Searching A. Tot[OCRerr][OCRerr] computer tilne to search (cpu seconds) 374 seconds (includes retrieval + ranking). 1. retrieval tilne (total cpu seconds between when a query enters the system until a list of document numbers al-c obtained) 2. railking time (total cpu seconds to sort d('cument list) B. Which methods best describe y[OCRerr][OCRerr]ur machine searching methods? 1. vector space m(XIel 2. probabilistic model C. What factors cLrC included in your ranking? 459