SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
11. proper noun identific[OCRerr][OCRerr]ti()n (dg()n[OCRerr]rn n[OCRerr][OCRerr]ne
12. tokenizer (rec()[OCRerr]nize.[OCRerr] d[OCRerr]te[OCRerr], phone nuiuber.[OCRerr], coifliflon patteni.';) Il()[OCRerr]C
13. aic the rn[OCRerr][OCRerr]u[OCRerr][OCRerr]ly-iiidexed tenn[OCRerr] used? none
14. other techniques used to build [OCRerr]ita structwes (brief description) flofle
B. Statistics on d[OCRerr][OCRerr]ita structures built from Tl[OCRerr]EC text (please fill out each applicable secti()n)
1. inverted index Based only on pairs, not individual ternis.
a. total ainount of storage (megabytes) 819 [OCRerr]egahytes
b. total computer tilne to build (approxu nate number of hours) 1(11) hours
C. is the pr('cess completely automatic? yes
d. £ire terin positions witlim d(icuments stored? no
e. single terms only? flolie
2. n-grains, suffix ([OCRerr]ays, signature files See BI.
C. Data built from sonices other th[OCRerr] [OCRerr]e input text --no
II. Query construction
(please fill out £,i section flir each query c()nsti[OCRerr]ucti()n method used)
A. Automatically built queries (ad hoc)
1. topic fields used Title, Description, Narrative, and Concepts (only tirst two.)
2. to[OCRerr]l computer time to build query (cpu seconds) (1.26 SeCond.
3. which of the following were used? fiolle
D. Automatically built queries (r()utin([OCRerr])
1. topic fields used Title, Description, Narrative, Concepts (first two).
2. total computer tune to build query (cpu seconds) 55 SeConds
3. which of thc following were used in building [OCRerr]e query?
c. phrase extrLlcti()n
(2) from [OCRerr][OCRerr]Lll trunilig documents
Word pairs occurring in the relevant training documents for the
query 1)ut not in the irrelevant documents were used.
III. Searching
A. Total computer tilne to se(u[OCRerr]di (cpu seconds)
1. retiiev[OCRerr]tl tilne (total cpu seconds between when a query enters tlie system until a list of
document numbers are ()bt[OCRerr]1ined)
This was not optimized f[OCRerr])r the current experiments. Run time was approximately
2(1 minutes per search. 1[OCRerr]r()per ()ptiniizati([OCRerr]n will reduce this tinie.
2. r[OCRerr][OCRerr]mkin[OCRerr] time (total cpu seconds to sort d('cument list) .22 seconds
B. Which metliods best describe your michine searching metliods?
4. n-gr£[OCRerr] matching
C. What f[OCRerr][OCRerr]ct()rs aic included in y[OCRerr][OCRerr]ur r[OCRerr][OCRerr]tnking?
11. li-grun fiequency
IV. What m£[OCRerr]hine did you conduct [OCRerr]e TRF£C experiment on? IB[OCRerr]1 3(19()/3()(lj
How much RAM did it h(tve? 16 Meg f([OCRerr]r a virtual machine.
Wh[OCRerr][OCRerr]it w[OCRerr]Ls the clock rate (if [OCRerr]e Cl[OCRerr][OCRerr] 1? 14.5 nanoseconds, or 69 MH7.
V. Some systems aic research prototypes (md others [OCRerr]ue c()Inmerci£(1.
To help c(imp£ue [OCRerr]ese systems:
465