SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
CLARIT TREC Design, Experiments, and Results
chapter
D. Evans
R. Lefferts
G. Grefenstette
S. Handerson
W. Hersh
A. Archbold
National Institute of Standards and Technology
Donna K. Harman
* Data from Partition:ng Theaaurua:
Thes[OCRerr]Weight(term):
Thes[OCRerr]Whole(term):
* Data from Document Test:
Real number weight assigned to term in par-
titioning thesaurus
Boolean value indicating that term is a whole
term in the thesaurus (1) or an attested su[OCRerr]
term of a whole term (0)
Tot[OCRerr]Terms: Number of terms in a document
Num[OCRerr]Terms:
Term[OCRerr]eq(term):
Term~ength(term):
Text[OCRerr]Who1e(term):
Number of unique terms (or sub-terms) found
in document that match terms (or sub-terms)
in the partitioning thesaurus
Frequency of term in document
Number of words in term
Boolean value indicating that term is a whole
term in the text (1) or an attested sub-term
of a whole term (0).
Figure 15: Feature Matching Score (Partitioning) Input Data
Doc£core =
Num~Terms
[OCRerr] Term[OCRerr]core(termi)
ln(Tot[OCRerr]Terms +1.72)
[ Num Term8 2
[OCRerr] II'erm[OCRerr]Freq(term[OCRerr])[OCRerr]
1=1 I
x i)j
(Tot[OCRerr]Terms +
Term£core(term) =
I
if (Thes[OCRerr]Whole(term) = 1)
Raw~core(term)
A (Text[OCRerr]Whole(term) = 1)
Raw[OCRerr]core(term) if (Thes[OCRerr]Whole(term) = 1)
4.0 A (Text[OCRerr]Whole(term) = 0)
R&w [OCRerr]core(term) if (Thes[OCRerr]Whole(term) = 0)
8.0 A (Text[OCRerr]Whole(term) = 1)
if (Thes[OCRerr]Whole(term) = 0)
0.0
A (Text[OCRerr]Whole(term) = 0)
RAW£core(term) - 2[[OCRerr][OCRerr]n(Term[OCRerr]ength(term)3)-1I )( Thes[OCRerr]Weight(term) x Term[OCRerr]Freq(term)
Figure 16: Formula for Scoring Documents in `Partitioning'
Doc- Hit- Phrasal-
Length Opportunity Term Status
Factor Factor Factor Factor
[[OCRerr]n(to1tGL)] X [([OCRerr]th0i[OCRerr]t:1)2] x [OCRerr] term[OCRerr]weight x term[OCRerr]froq x [2[OCRerr][OCRerr][OCRerr]1 x [OCRerr] 0I[OCRerr] [OCRerr]
Figure 17: Schematization of `P&titioning' Formula
267