Definitions of terms used in Information Extraction

Attribute
a property of an entity such as its name, alias, descriptor, or type
Annotation
mark up of a text span in a specific format that indicates a feature or features of the text within the span
Benchmark
assessment of performance according to standard measures
Data
textual input for an information extraction system
Dataset
a set of newswire texts chosen according to pre-specified conditions and meant to represent a rich text stream
Database
data in tabular format stored with the assistance of a relational database management system
Developer
a researcher who implements a system
Dry Run
an end-to-end practice run of an evaluation
Entity
an object of interest such as a person or organization
Evaluation
assessment of performance according to agreed upon measures
Event
an activity or occurrence of interest such as a terrorist act or an airline crash
Fact
a relationship held between two or more entities
Formal Test Material
a blind dataset, task definitions, test procedure, answer keys, and scoring software
Formal Run
the "official" evaluation
Information Extraction
the extraction or pulling out of pertinent information from large volumes of texts
Information Extraction Systems
an automated system to extract pertinent information from large volumes of text
Information Extraction Technologies
techniques used to automatically extract specified information from text
Metrics
pre-defined measures of performance calculable by comparison of system output with human-generated answer keys
MUC
Message Understanding Conference held at the end of the evaluation and attended only by participants and invited potential customers
Named Entity
a named object of interest such as a person, organization, or location
SAIC
Science Applications International Corporation
Scoring Software
fully automated software for the comparison of system performance against answer keys that tallies and reports metrics and error types for developers and evaluators
Search Engine
software which gives relevance rankings to documents in a collection based on a user query
Sources of News
edited electronic feeds from established news organizations such as the Wall Street Journal and the New York Times News Service
Statistical Algorithm
algorithm to determine the statistical significance of evaluation results
Systems Integration
building a system from off-the-shelf components to accomplish a job previously not automated
Systems Integrator
builder of a system from off-the-shelf components
Task Definition
document which defines the format and criteria for annotation or extraction of text and placement into a database or template. For example, task definitions give general guidelines and examples for the extraction of named entities, attributes, facts, and events from texts.
Text
electronically encoded alphabetic material from some human language
Training
process by which a system learns about a dataset


For more website information contact: Ellen Voorhees
For more evaluation information contact: Nancy Chinchor
Last updated: Tuesday, 08-Mar-2005 13:16:36 MST
Date created: Friday, 12-Jan-01