SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) CLARIT TREC Design, Experiments, and Results chapter D. Evans R. Lefferts G. Grefenstette S. Handerson W. Hersh A. Archbold National Institute of Standards and Technology Donna K. Harman * Data from Partition:ng Theaaurua: Thes[OCRerr]Weight(term): Thes[OCRerr]Whole(term): * Data from Document Test: Real number weight assigned to term in par- titioning thesaurus Boolean value indicating that term is a whole term in the thesaurus (1) or an attested su[OCRerr] term of a whole term (0) Tot[OCRerr]Terms: Number of terms in a document Num[OCRerr]Terms: Term[OCRerr]eq(term): Term~ength(term): Text[OCRerr]Who1e(term): Number of unique terms (or sub-terms) found in document that match terms (or sub-terms) in the partitioning thesaurus Frequency of term in document Number of words in term Boolean value indicating that term is a whole term in the text (1) or an attested sub-term of a whole term (0). Figure 15: Feature Matching Score (Partitioning) Input Data Doc£core = Num~Terms [OCRerr] Term[OCRerr]core(termi) ln(Tot[OCRerr]Terms +1.72) [ Num Term8 2 [OCRerr] II'erm[OCRerr]Freq(term[OCRerr])[OCRerr] 1=1 I x i)j (Tot[OCRerr]Terms + Term£core(term) = I if (Thes[OCRerr]Whole(term) = 1) Raw~core(term) A (Text[OCRerr]Whole(term) = 1) Raw[OCRerr]core(term) if (Thes[OCRerr]Whole(term) = 1) 4.0 A (Text[OCRerr]Whole(term) = 0) R&w [OCRerr]core(term) if (Thes[OCRerr]Whole(term) = 0) 8.0 A (Text[OCRerr]Whole(term) = 1) if (Thes[OCRerr]Whole(term) = 0) 0.0 A (Text[OCRerr]Whole(term) = 0) RAW£core(term) - 2[[OCRerr][OCRerr]n(Term[OCRerr]ength(term)3)-1I )( Thes[OCRerr]Weight(term) x Term[OCRerr]Freq(term) Figure 16: Formula for Scoring Documents in `Partitioning' Doc- Hit- Phrasal- Length Opportunity Term Status Factor Factor Factor Factor [[OCRerr]n(to1tGL)] X [([OCRerr]th0i[OCRerr]t:1)2] x [OCRerr] term[OCRerr]weight x term[OCRerr]froq x [2[OCRerr][OCRerr][OCRerr]1 x [OCRerr] 0I[OCRerr] [OCRerr] Figure 17: Schematization of `P&titioning' Formula 267