SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Appendix C: System Features
appendix
National Institute of Standards and Technology
Donna K. Harman
1 DECstati()n 5000 (24-Meg RAM)
3 DECstati()n 3100 (24-Meg RAM)
V. Some systems are research prototypes Lnd others `uc c()InInerci[OCRerr]'d.
To help compare [OCRerr]ese systems:
1. how much `s()ftw'Łue engineerin'[OCRerr]" went into the development of your system?
The CLARIT system is a research prototype and has Ileen under development for
4 years. The original system was implemented in Lisp; the current system has 1)eefl
re-engineered into C in the past 12 monthS.
The specific configuration of the system used in the TREC experiments was
produced in less than a week.
As a research prototype, tile system has minimal true "software engineering".
2. Cuven appropri[OCRerr]'ite rcs[OCRerr])urces, could your system be made to ilin f[OCRerr]Lster? By how much
(estiIn[OCRerr]'[OCRerr]te)?
Size constraints and the lack of gl([OCRerr]I)al methods [OCRerr] attack caused us to duplicate work
(I)()tll human and computer). (;l()1)al methods that are smarter aI)out resource
CoilSuIliptioll could make an order of magnitude difference. Almost all CLARIT
processing is modular and separal)le; results of pr([OCRerr]cesses are additivelcomposal)le.
Splitting the pr([OCRerr]ess across machines--or running in parallel--would greatly speed
up the system.
3. What features is your system missing that it would benefit by if it had them?
User interface.
Some datahase mechanism for document storage.
Potential "next features" include the f()ll()wing:
- automatic spelling correction
- integrated pr([OCRerr]per noun recognition
- programmaille token recognition
- progranimaille I automated category assignment (guessing)
- pr()grammal)le I automated d[OCRerr])cument structure analysis
- automated syn[OCRerr][OCRerr]nym I related word discovery and use
- datahase support tor domains and thesauri, contexts, etc.
- an integrated interface for 1)0th datal)ase construction and (Juery
elaboration
501