SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
3. which of the f()ll()win[OCRerr] were used?
[OCRerr]L. telin weighting wi[OCRerr] weights based on te[OCRerr]s in topics Yes, as descril)ed al)()ve
d. word sense dis(UnbigUL[OCRerr]ti()n
Ouly [OCRerr]s (1escril)ed al)()VC (two original ([OCRerr]uery ternis must agree on a synonym
to l)e `1(1(led).
h. exp[OCRerr]rnsi()n of qUeries using previ()Usly-c()nst1[OCRerr]ucted d[OCRerr]ta struetwe (from part I)
(1) which snucture? [OCRerr]V()rdNet.
III. Searching
A. TotLil computer tilne to se[OCRerr]'uch (cpU secouds)
1. retrieval tjine (t()t£il CPU seconds between when a query enters [OCRerr]e system until a list. of
document nunibers [OCRerr]ue ()btLtined)
15 CPU seconds, (P11 average (756.4 cpu seconds to [OCRerr]r()CC55 So (lueries)
2. r[OCRerr][OCRerr]nking time (totil CPU seconds to sort d([OCRerr]UInent list)
not applical)le: list ([OCRerr]f top 2(H) similarities maintained while searching
B. Which methods best describe your `n[OCRerr]chine se[OCRerr]nching methods.?
1. vector 5[OCRerr](LCC model
C. VVhat factors [OCRerr]ue included in your r£[OCRerr]nking?
1. telin frequency
2. inverse d([OCRerr]uInent frequency
4. seln'wtic closeness (L'L[OCRerr] in selnintic ilet distance) (synonyms)
9. docullient leng[OCRerr]
13. word sense frequency
(nouns with only ()11C sense in [OCRerr]V()rdNet get all their synonyms added)
IV. What machine did YOU coilduct the Fl[OCRerr]i£('. experimeut oil? [OCRerr]un II[OCRerr]X
[low much 1[OCRerr]AM did it h([OCRerr]Ive? 64 megal)ytes
Wh[OCRerr]it w'[OCRerr]s the clock nite of [OCRerr]e Cl[OCRerr]t J'? 4E)[OCRerr]1Hz
V. Some systems [OCRerr]`u-e rese([OCRerr]Uch prototypes (md others ne c()mmerci[OCRerr][OCRerr]d.
To help c()1np[OCRerr]U'e [OCRerr]ese systems:
1. [low much "s()ftw[OCRerr]ire engilleering" went into the development of your system?
Our system Is a version of SMART with Ilb)dlfled indexing C(Pde. SMART has l)een
well-engineered (but its main goal is tleXil)ility, not raw speed). Little time was spent
(Pptimizing our Illodjfications.
2. (jiveli ([OCRerr]ippr()pfl'L'ite resources. could [OCRerr]()U[OCRerr] system be made to ruii f[OCRerr][OCRerr]ster? By hoW much
(estimate)?
SMART could pr()l)al)ly l)e made to run s[OCRerr][OCRerr]mewhat taster it' it were made less
tiexible, that is, it' we coded a version that performed only the sorts of runs we made
here. I doubt the difference would be dramatic. Preprocessing steps perfi[OCRerr]rmed on
W()rdNet could impr(Pve the efficiency ([OCRerr]f the expansion code.
3. WhIt fe'ttures is Y()U[OCRerr] system missing th[OCRerr]it it would benefit by if it had them?
Incorporating part-[OCRerr][OCRerr]f-speech tagging s(P that we could kn(Pw it' the term is a noun
befiPre looking it up in [OCRerr]Vi[OCRerr]rdNet should be beneficIal (we didn't do this for TREC
because the tagger we have is fairly slow). In the same vein, a true sense
disaml)Iguat(Pr--a way (Pf picking the c(Prrect W()rdNet synonym set--would clearly
help, but I d(Pn't kn[OCRerr][OCRerr]w of a way of doing th at automatically yet (it is part of. our
research).
518 *u.s. (;.P.O.:1993-341-931:82636