Definitions of terms used in Information Extraction
- Attribute
- a property of an entity such as its name, alias, descriptor, or type
- Annotation
- mark up of a text span in a specific format that indicates a feature or
features of the text within the span
- Benchmark
- assessment of performance according to standard measures
- Data
- textual input for an information extraction system
- Dataset
- a set of newswire texts chosen according to pre-specified
conditions and meant to represent a rich text stream
- Database
- data in tabular format stored with the assistance of a
relational database management system
- Developer
- a researcher who implements a system
- Dry Run
- an end-to-end practice run of an evaluation
- Entity
- an object of interest such as a person or organization
- Evaluation
- assessment of performance according to agreed upon measures
- Event
- an activity or occurrence of interest such as a terrorist act
or an airline crash
- Fact
- a relationship held between two or more entities
- Formal Test Material
- a blind dataset, task definitions, test
procedure, answer keys, and scoring software
- Formal Run
- the "official" evaluation
- Information Extraction
- the extraction or pulling out of pertinent
information from large volumes of texts
- Information Extraction Systems
- an automated system to extract
pertinent information from large volumes of text
- Information Extraction Technologies
- techniques used to automatically
extract specified information from text
- Metrics
- pre-defined measures of performance calculable by comparison
of system output with human-generated answer keys
- MUC
- Message Understanding Conference held at the end of the
evaluation and attended only by participants and invited potential customers
- Named Entity
- a named object of interest such as a person,
organization, or location
- SAIC
- Science Applications International Corporation
- Scoring Software
- fully automated software for the comparison of system performance
against answer keys that tallies and reports metrics and error types for developers and evaluators
- Search Engine
- software which gives relevance rankings to documents
in a collection based on a user query
- Sources of News
- edited electronic feeds from established news
organizations such as the Wall Street Journal and the New York Times News Service
- Statistical Algorithm
- algorithm to determine the statistical
significance of evaluation results
- Systems Integration
- building a system from off-the-shelf
components to accomplish a job previously not automated
- Systems Integrator
- builder of a system from off-the-shelf components
- Task Definition
- document which defines the format and criteria for
annotation or extraction of text and placement into a database or template. For example, task
definitions give general guidelines and examples for the extraction of named entities,
attributes, facts, and events from texts.
- Text
- electronically encoded alphabetic material from some human
language
- Training
- process by which a system learns about a dataset
For more website information contact:
Ellen Voorhees
For more evaluation information contact:
Nancy Chinchor
Last updated: Tuesday, 08-Mar-2005 15:16:36 EST
Date created: Friday, 12-Jan-01