SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Appendix C: System Features appendix National Institute of Standards and Technology Donna K. Harman 1 DECstati()n 5000 (24-Meg RAM) 3 DECstati()n 3100 (24-Meg RAM) V. Some systems are research prototypes Lnd others `uc c()InInerci[OCRerr]'d. To help compare [OCRerr]ese systems: 1. how much `s()ftw'Łue engineerin'[OCRerr]" went into the development of your system? The CLARIT system is a research prototype and has Ileen under development for 4 years. The original system was implemented in Lisp; the current system has 1)eefl re-engineered into C in the past 12 monthS. The specific configuration of the system used in the TREC experiments was produced in less than a week. As a research prototype, tile system has minimal true "software engineering". 2. Cuven appropri[OCRerr]'ite rcs[OCRerr])urces, could your system be made to ilin f[OCRerr]Lster? By how much (estiIn[OCRerr]'[OCRerr]te)? Size constraints and the lack of gl([OCRerr]I)al methods [OCRerr] attack caused us to duplicate work (I)()tll human and computer). (;l()1)al methods that are smarter aI)out resource CoilSuIliptioll could make an order of magnitude difference. Almost all CLARIT processing is modular and separal)le; results of pr([OCRerr]cesses are additivelcomposal)le. Splitting the pr([OCRerr]ess across machines--or running in parallel--would greatly speed up the system. 3. What features is your system missing that it would benefit by if it had them? User interface. Some datahase mechanism for document storage. Potential "next features" include the f()ll()wing: - automatic spelling correction - integrated pr([OCRerr]per noun recognition - programmaille token recognition - progranimaille I automated category assignment (guessing) - pr()grammal)le I automated d[OCRerr])cument structure analysis - automated syn[OCRerr][OCRerr]nym I related word discovery and use - datahase support tor domains and thesauri, contexts, etc. - an integrated interface for 1)0th datal)ase construction and (Juery elaboration 501