Statistical Label Classifier
Use the topic-labeler facility from UTD
Compute several features for each phrase term
- tf-idf, p(t|F), p(t|F) / p(t), % docs in folder with this term, is the term a multi-word phrase?,is it a person-name?, location-name?, org-name?
Compute p(term is label | features)
- Model is trained on PSM topics (which have labels)
Terms are always in folder
Terms are almost always relevant