NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman 3. which of the f()ll()win[OCRerr] were used? [OCRerr]L. telin weighting wi[OCRerr] weights based on te[OCRerr]s in topics Yes, as descril)ed al)()ve d. word sense dis(UnbigUL[OCRerr]ti()n Ouly [OCRerr]s (1escril)ed al)()VC (two original ([OCRerr]uery ternis must agree on a synonym to l)e `1(1(led). h. exp[OCRerr]rnsi()n of qUeries using previ()Usly-c()nst1[OCRerr]ucted d[OCRerr]ta struetwe (from part I) (1) which snucture? [OCRerr]V()rdNet. III. Searching A. TotLil computer tilne to se[OCRerr]'uch (cpU secouds) 1. retrieval tjine (t()t£il CPU seconds between when a query enters [OCRerr]e system until a list. of document nunibers [OCRerr]ue ()btLtined) 15 CPU seconds, (P11 average (756.4 cpu seconds to [OCRerr]r()CC55 So (lueries) 2. r[OCRerr][OCRerr]nking time (totil CPU seconds to sort d([OCRerr]UInent list) not applical)le: list ([OCRerr]f top 2(H) similarities maintained while searching B. Which methods best describe your `n[OCRerr]chine se[OCRerr]nching methods.? 1. vector 5[OCRerr](LCC model C. VVhat factors [OCRerr]ue included in your r£[OCRerr]nking? 1. telin frequency 2. inverse d([OCRerr]uInent frequency 4. seln'wtic closeness (L'L[OCRerr] in selnintic ilet distance) (synonyms) 9. docullient leng[OCRerr] 13. word sense frequency (nouns with only ()11C sense in [OCRerr]V()rdNet get all their synonyms added) IV. What machine did YOU coilduct the Fl[OCRerr]i£('. experimeut oil? [OCRerr]un II[OCRerr]X [low much 1[OCRerr]AM did it h([OCRerr]Ive? 64 megal)ytes Wh[OCRerr]it w'[OCRerr]s the clock nite of [OCRerr]e Cl[OCRerr]t J'? 4E)[OCRerr]1Hz V. Some systems [OCRerr]`u-e rese([OCRerr]Uch prototypes (md others ne c()mmerci[OCRerr][OCRerr]d. To help c()1np[OCRerr]U'e [OCRerr]ese systems: 1. [low much "s()ftw[OCRerr]ire engilleering" went into the development of your system? Our system Is a version of SMART with Ilb)dlfled indexing C(Pde. SMART has l)een well-engineered (but its main goal is tleXil)ility, not raw speed). Little time was spent (Pptimizing our Illodjfications. 2. (jiveli ([OCRerr]ippr()pfl'L'ite resources. could [OCRerr]()U[OCRerr] system be made to ruii f[OCRerr][OCRerr]ster? By hoW much (estimate)? SMART could pr()l)al)ly l)e made to run s[OCRerr][OCRerr]mewhat taster it' it were made less tiexible, that is, it' we coded a version that performed only the sorts of runs we made here. I doubt the difference would be dramatic. Preprocessing steps perfi[OCRerr]rmed on W()rdNet could impr(Pve the efficiency ([OCRerr]f the expansion code. 3. WhIt fe'ttures is Y()U[OCRerr] system missing th[OCRerr]it it would benefit by if it had them? Incorporating part-[OCRerr][OCRerr]f-speech tagging s(P that we could kn(Pw it' the term is a noun befiPre looking it up in [OCRerr]Vi[OCRerr]rdNet should be beneficIal (we didn't do this for TREC because the tagger we have is fairly slow). In the same vein, a true sense disaml)Iguat(Pr--a way (Pf picking the c(Prrect W()rdNet synonym set--would clearly help, but I d(Pn't kn[OCRerr][OCRerr]w of a way of doing th at automatically yet (it is part of. our research). 518 *u.s. (;.P.O.:1993-341-931:82636