DESCRIPTION OF LOCKHEED MARTIN'S NLTOOLSET

AS APPLIED TO MUC-7 (AATM7)

Deborah Brady

Lois Childs

David Cassel

Bob Magee

Norris Heintzelman

Dr. Carl Weir

 

Lockheed Martin

Management & Data Systems (M&DS)

Building 10, Room 1527

P.O. Box 8048

Philadelphia, PA 19101

 

BACKGROUND

 

The NLToolset has been used to build a variety of information extraction applications, ranging from military message traffic to newswire accounts of corporate activity. AATM7 is an acronym for As Applied To MUC-7. AATM7 was not tailored specifically for MUC-7, but rather represents the NLToolset in a state of flux, as TIPSTER experimentation and the delivery of a real-world application were taking place, simultaneously. This contrast in domains proved beneficial for our real-world applications, perhaps to the detriment of the MUC-7 system, which had to compete for developers.

 

NLToolset applications are delivered under the Windows NT, as well as the UNIX Solaris operating system.

 

TEMPLATE ELEMENT TASK

 

AATM7 was applied to the MUC-7 Template Element task in order to test some theories of coreference that were being investigated under the TIPSTER III research activity. The Template Element task requires an automatic system to build templates for every person, organization, and artifact entity, as well as every location.

Entities

The Entities are defined as follows:

An organization object consists of:

organization's name and aliases found in the text,

a type slot of ORGANIZATION,

one descriptor phrase, and

the category of the organization: ORG_CO, ORG_GOVT, or ORG_OTHER.

 

A person object consists of:

person's name and aliases found in the text,

a type slot of PERSON,

one descriptor phrase, and

the category of the person: PER_CIV or PER_MIL

 

An artifact object consists of:

artifact's name and aliases found in the text,

a type slot of ARTIFACT,

one descriptor phrase, and

the category of the artifact: ART_AIR, ART_LAND, or ART_WATER.

 

To perform this task perfectly, an automatic system must link all references to the same entity within a text, and collect those references, whether they be names or descriptive noun phrases. The entire list of unique names for an entity is placed in the "NAME" slot. Of the descriptors, the system must pick one of those found, and put it in the "DESCRIPTOR" slot, as long as it is not "insubstantial" according to the fill rules, e.g. "the company" or "Dr." Pronouns are also excluded from the entity object. Additionally, the system must decide to what category the entity belongs, either through its knowledge base or the surrounding context, e.g. "Gen. Smith" vs. "Ms. Smith" as PER_MIL vs. PER_CIV.

 

The limitation to one descriptor can have the effect of hiding how well the coreference resolution has performed, since a system may have found all descriptive phrases, plus one incorrect descriptor, and chosen the incorrect descriptor, thus getting a score of incorrect for the entire slot. Lockheed Martin is planning to test a multiple-descriptor version of MUC-7, in the near future.

 

Of the three entity types, those of "PERSON" and "ORGANIZATION" are the most similar, since language is used in similar ways to describe them. They both can be named, where the "name" is an identity which, within the context of a story, is usually unique. The artifact, which in MUC terms can be a land, air, sea, or space vehicle, is sometimes named, but often the tag which is considered the name is merely a type. For example, a story that tells about three different F-14 crashes may, according to MUC rules, produce three different entities named "F-14", whose only difference would be found in information not captured by the TE object.

Locations

Locations are defined as follows:

A location object consists of:

locale found in the text,

the country where the locale exists, and

the locale type: CITY, PROVINCE, COUNTRY, REGION, AIRPORT, or UNK.

 

The location object’s locale slot is filled with the most specific reference to a location. For example, if the location were "Philadelphia, PA," the locale slot would be filled with "Philadelphia." The country would be "United States" and the locale type would be "CITY." The deficiency of this design is obvious; it fails to differentiate between the actual location and any other city named "Philadelphia" in the nation. An alternative design, which has been used for other NLToolset applications, contains a locale slot which holds the entire phrase describing the locale. Some examples are:

"at the checkpoint on Route 30"

"southwest of Miami"

"Wilmington, Delaware"

 

Additionally, the location object contains slots for whatever other information can be gleaned from the text or from on-line resources, such as a gazetteer. This includes slots for city, country, province, latitute/longitude, region, or water.

TIPSTER Research

AATM7 was developed with a focus on the investigation of a number of techniques involved in coreference resolution. Coreference Resolution can be thought of as the identification and linking of all references to a particular entity. References may be in the form of names, pronouns, or noun phrases.

Syntax is frequently used by an author to associate a descriptive phrase with an entity. This can be seen in the following examples:

 

APPOSITIVE: "Lockheed Martin, an aerospace firm,"

PRENOMIAL: "the aerospace firm, Lockheed Martin"

NAME-MODIFIED HEAD NOUN: "the Lockheed Martin aerospace firm"

PREDICATIVE NOMINATIVE: "Lockheed Martin is an aerospace firm"

 

When an entity is referred to only by a descriptive phrase, finding its true identity is very challenging. The following sentence

"The president has announced that he will resign."

has varying degrees of import, depending on its preceding sentence…

"Coca Cola Company today revealed the future plans of its president, James Murphy."

"Impeachment hearings were scheduled to begin today against President Clinton."

An automatic system can use the information closely related by syntax to the entity, in this case the title "President" or the prenominal "its president", to identify the entity referred to by "the president." This is the heart of our current research. Our aim is to find all descriptive information closely related by syntax and to build a story-specific ontology for each entity so that far-flung references that depend on this semantic information can be identified.As part of this research, the Template Element development keys were analyzed to determine how often the descriptors of an organization and person are directly associated by syntax. A surprisingly large number of descriptive phrases within the keys can be directly associated to an entity by way of syntax. Of a total of approximately 900 descriptors, 125 were organization descriptors, and 775, person descriptors --- a disproportionate number, since there are actually more organization entities (985) than person entities (802) in the keys.

The following table shows the breakdown by category and entity type. "Association by Context" refers to descriptors that have been found in titles, prenominal phrases, appositives, and predicate nominatives. "Association by Reference" refers to a remote reference which refers to a named entity. "Un-named" refers to entities described by noun phrases alone, e.g. "a local bank."

 

Category

Person

Organization

Association by Context

548 (71%)

33 (26%)

Association by Reference

103 (13%)

53 (42%)

Un-named

119 (15%)

38 (30%)

Table 1: Training Set Analysis

This data supports the hypothesis that much reliable descriptive information can be obtained through syntactic association. This descriptive information can be associated with the entity object and then be used to help resolve associations by reference, in a manner similar to that used for organizations in the Lockheed Martin MUC-6 system, LOUELLA. This is the idea of a semantic filter, which was used to compare descriptive phrases with the semantic content of organization names, as in the following example.

"Buster Brown Shoes" => (buster brown shoes shoe footwear)

"the footwear maker" => (footwear maker make manufacturer)

Since person names rarely include semantic content, we must rely on other descriptive information to build the semantics, either through world knowledge stored in the system’s knowledge base or through associations found in the text itself.

As part of Lockheed Martin’s TIPSTER research, the freeware Brill part-of-speech tagger was connected to the NLToolset to see if it could help streamline the process of building patterns to find descriptors. Since standard NLToolset processing provides all possible parts of speech for each token, a part-of-speech tagger was introduced to see if it could simplify the process of pattern writing. It was found that a package for finding and correctly linking the majority of person descriptors could be written in about a week by incorporating the information that Brill provides with that provided by the NLToolset, i.e. symbol name, semantic category, and possible parts of speech as found in the NLToolset’s lexicon. The contrast between the descriptor scores for persons and organizations in the test set is striking.

 

DESCRIPTORS

RECALL

PRECISION

PERSON

61

55

ORGANIZATION

28

20

Table 2: Descriptor Scores

Finding artifacts and linking up all references to the same entity has proved especially challenging because of the unusual way that artifacts are described in text, and the way that the descriptions are categorized for MUC-7. For instance, "Boeing 747" and "F-14" are considered names, whereas "TWA Flight 800" is considered a descriptor. Under the TIPSTER research, a new algorithm was developed to find vehicles and resolve coreferences. The algorithm differs from that for organizations and people in that a match is assumed to belong to the most recently seen entity, unless there is some information to contradict this assumption. The possible types of contradictory information are: model information, manufacturer, military branch, airline, and flight number. Further, if the comparison reveals that one entity has military information and the other has airline information, there is a contradiction. Further, the variable-binding feature of the NLToolset’s pattern matching allows the developer to extract type information while finding the entities in the text. This type information helps the system to distinguish between entities during coreference resolution.

 

RESULTS ANALYSIS

 

Overall, AATM7’s scores for MUC-7 are good. There are a few errors, as well as some quirks of the MUC-7 domain, that will be discussed which significantly effected the scores for entity names and locations. The artifact scores are significantly below the NLToolset’s usual performance; this is due to the newness of this entity, particularly of the space vehicle artifacts. This capability is still a work in progress, as the need arises for our real-world applications

.

*

*

*

SUMMARY SCORES

*

*

*

POS

ACT

COR

PAR

INC

MIS

SPU

NON

REC

PRE

UND

OVG

SUB

ERR

SUBTASK SCORES

entity

artifact

197

241

165

0

17

15

59

12

84

68

8

24

9

36

organization

866

910

800

0

33

33

77

11

92

88

4

8

4

15

person

469

534

457

0

7

5

70

0

97

86

1

13

2

15

location

airport

21

18

1

0

17

3

0

2

5

6

14

0

94

95

city

226

226

197

0

9

20

20

2

87

87

9

9

4

20

country

260

239

221

0

10

29

8

7

85

92

11

3

4

18

province

52

53

42

0

6

4

5

19

81

79

8

9

13

26

region

89

40

29

0

11

49

0

4

33

73

55

0

28

67

unk

33

14

4

0

10

19

0

3

12

29

58

0

71

88

water

12

10

10

0

0

2

0

1

83

100

17

0

0

17

OBJ SCORES

location

693

626

554

0

32

107

40

18

80

88

15

6

5

24

entity

1532

1685

1432

0

47

53

206

23

93

85

3

12

3

18

SLOT SCORES

location

locale

697

626

511

0

75

111

40

24

73

82

16

6

13

31

locale_type

693

600

504

0

63

126

33

38

73

84

18

6

11

31

country

691

624

493

0

90

108

41

26

71

79

16

7

15

33

entity

ent_name

1761

1731

1305

0

159

297

267

28

74

75

17

15

11

36

ent_type

1532

1685

1422

0

57

53

206

23

93

84

3

12

4

18

ent_descrip

680

819

338

0

175

167

306

585

50

41

25

37

34

66

ent_categor

1532

1685

1340

0

139

53

206

53

87

80

3

12

9

23

ALL SLOTS

7586

7771

5913

0

758

915

1100

1061

78

76

12

14

11

32

P&R

2P&R

P&2R

F-MEASURES

77.01

76.45

77.57

Table 3: Overall MUC-7 Scores

Since the TE task spans four separate subtasks with very different characteristics, an analysis was done on each. The formal run keys were split into four sets: organization, person, artifact, and location keys. The formal run was then also split into organization, person, artifact, and location responses. Each set was then respectively scored with SAIC’s version 3.3 of the MUC scoring program. The results are described below. This scoring method removes the mapping ambiguity between entities of different types and allows an accurate analysis of the performance of each individual entity type.

People

AATM7 found 97% of the people objects, with 86% of the names correctly. The slot scores are high, even the descriptor slot, which has traditionally been at less than 50%. To improve on this performance, one problem that could very easily be resolved is an incorrect interpretation of expressions like "(NI FRX)" in the formal text. "NI" is a common first name in some languages and therefore, AATM7 interpreted all thirteen of these as person names. This error accounted for 13 of the overgenerated or incorrect person names, or the equivalent of 2 points of precision.

 

Another area for improvement is in the descriptor slot. Twenty-six of AATM7’s person descriptors were marked incorrect because they contained only the head of the noun phrase and not the entire phrase, e.g. "commander" instead of "Columbia’s commander" and "manager" instead of "project manager." The descriptor rule package will be improved to better encompass the entire phrase. If these descriptors had been extracted correctly for the MUC-7 test, the descriptor recall and precision would have improved to 70 and 63, while the overall person scores would have improved to 89 recall, 79 precision, and 83.7 F-measure.

 

*

*

*

SUMMARY SCORES

*

*

*

POS

ACT

COR

PAR

INC

MIS

SPU

NON

REC

PRE

UND

OVG

SUB

ERR

SUBTASK SCORES

person

469

560

457

0

0

12

103

0

97

82

3

18

0

20

OBJ SCORES

entity

469

560

457

0

0

12

103

0

97

82

3

18

0

20

SLOT SCORES

entity

ent_name

568

564

491

0

14

63

59

1

86

87

11

10

3

22

ent_type

469

560

457

0

0

12

103

0

97

82

3

18

0

20

ent_descrip

302

335

184

0

61

57

90

147

61

55

19

27

25

53

ent_categor

469

560

444

0

13

12

103

28

95

79

3

18

3

22

obj_status

0

0

0

0

0

0

0

4

0

0

0

0

0

0

comment

0

0

0

0

0

0

0

13

0

0

0

0

0

0

ALL SLOTS

1808

2019

1576

0

88

144

355

193

87

78

8

18

5

27

P&R

2P&R

P&2R

F-MEASURES

82.36

79.72

85.18

Table 4: Person Object Scores

 

Organizations

 

Organizations are complex entities to determine in text because organization names have a more complex structure than person names. A variation algorithm for one name may not work for another. For example, "Hughes" is a valid variation for "Hughes Aerospace, Inc." but "Space" is not a valid variation for "Space Technology Industries". An automatic system must, therefore, look at the surrounding context of variations and filter out those that are spurious.

 

AATM7 found 780 of the 877 organizations in the formal test corpus. Of the 780 it found, points were lost here and there for mistakes in two areas. First, current performance on organization descriptors is woefully inadequate and in sharp contrast to that on person descriptors. An effort is currently underway to improve this with the help of a part-of-speech tagger. Additionally, it was discovered that the mechanism for creating and linking variations of organization names was broken during the training period. The result of this was that 64 name variations were missed. When this problem was fixed, recall and precision for ent_name improved to 76 and 77, with the overall organization recall and precision improving to 80 and 77.

 

*

*

*

SUMMARY SCORES

*

*

*

POS

ACT

COR

PAR

INC

MIS

SPU

NON

REC

PRE

UND

OVG

SUB

ERR

SUBTASK SCORES

organization

865

889

800

0

0

65

89

12

92

90

8

10

0

16

OBJ SCORES

entity

865

889

800

0

0

65

89

12

92

90

8

10

0

16

SLOT SCORES

entity

ent_name

1062

984

742

0

111

209

131

25

70

75

20

13

13

38

ent_type

865

889

800

0

0

65

89

12

92

90

8

10

0

16

ent_descrip

196

265

54

0

44

98

167

126

28

20

50

63

45

85

ent_categor

865

889

733

0

67

65

89

14

85

82

8

10

8

23

obj_status

0

0

0

0

0

0

0

69

0

0

0

0

0

0

comment

0

0

0

0

0

0

0

50

0

0

0

0

0

0

ALL SLOTS

2988

3027

2329

0

222

437

476

296

78

77

15

16

9

33

P&R

2P&R

P&2R

F-MEASURES

77.44

77.14

77.74

Table 5: Organization Object Scores

 

Artifacts

AATM7’s artifact performance really suffers in the area of entity names. It missed almost half of the artifact entities purely from lack of patterns with which to recognize them. This is a sign of the immaturity of the artifact packages and can be overcome by more development. Another problem, which caused the low precision, was the result of incorrectly identifying the owner of the artifact as its name. This accounted for 38 of the spurious entity names and 2% of the precision. Since this is a new package, the coreference resolution is also not up to the NLToolset’s usual performance. This is an on-going research effort.

 

 

*

*

*

SUMMARY SCORES

*

*

*

POS

ACT

COR

PAR

INC

MIS

SPU

NON

REC

PRE

UND

OVG

SUB

ERR

SUBTASK SCORES

artifact

197

236

165

0

0

32

71

12

84

70

16

30

0

38

OBJ SCORES

entity

197

236

165

0

0

32

71

12

84

70

16

30

0

38

SLOT SCORES

entity

ent_name

130

183

60

0

15

55

108

3

46

33

42

59

20

75

ent_type

197

236

165

0

0

32

71

12

84

70

16

30

0

38

ent_descrip

181

219

98

0

48

35

73

313

54

45

19

33

33

61

ent_categor

197

236

165

0

0

32

71

12

84

70

16

30

0

38

obj_status

0

0

0

0

0

0

0

27

0

0

0

0

0

0

comment

0

0

0

0

0

0

0

46

0

0

0

0

0

0

ALL SLOTS

705

874

488

0

63

154

323

413

69

56

22

37

11

53

P&R

2P&R

P&2R

F-MEASURES

61.81

58.08

66.05

Table 6: Artifact Object Scores

Locations

 

The NLToolset performs well at finding and disambiguating locations. Determining the country for a given location can be complicated since many named locations exist in multiple countries. A small number of minor changes have been identified to significantly boost the score to its normal level. One of the obvious problems AATM7 had was with the airports. Eleven occurrences of Kennedy Space Center were identified as locale type "CITY" instead of the correct type of "AIRPORT". This was caused by a simple inconsistency in our location processing. Fixing this one problem, improved the airport-specific recall and precision to 57 and 67 respectively, and improved the precision overall by 1 percentage point.

 

The location recall for MUC-7 is slightly depressed because of some challenges which this particular domain presented. AATM7 was not configured to process planet names or other extra-terrestrial bodies as locations. This accounted for sixty-three missing items, at three slots per item; thirty-one of the missing were occurrences of "earth" alone. This is reflected in the subtask scores for region and unk. By just adding these locations to the NLToolset’s knowledge base, recall and precision was improved to 82 and 83 for the location object.

 

Another quirk of the MUC-7 domain was that adjectival forms of nation names were to be extracted as location objects, if they were the only references to the nation in the text. In other words, if the text contains the phrase "the Italian satellite" but no other mention of Italy, a location object with the locale "Italian" would be extracted. This was not addressed in AATM7 and resulted in a loss of thirty-two location objects, at three slots per object. This feature could be added just for the MUC-7 test. It is unlikely that a real-world application would want this information extracted. If it is added, recall and precision for the location object rise to 86 and 84 with an overall F-measure of 85.

 

 

 

*

*

*

SUMMARY SCORES

*

*

*

POS

ACT

COR

PAR

INC

MIS

SPU

NON

REC

PRE

UND

OVG

SUB

ERR

SUBTASK SCORES

location

airport

21

18

1

0

17

3

0

2

5

6

14

0

94

95

city

226

226

197

0

9

20

20

2

87

87

9

9

4

20

country

260

239

221

0

10

29

8

7

85

92

11

3

4

18

province

52

53

42

0

6

4

5

19

81

79

8

9

13

26

region

89

40

29

0

11

49

0

4

33

73

55

0

28

67

unk

33

14

4

0

10

19

0

3

12

29

58

0

71

88

water

12

10

10

0

0

2

0

1

83

100

17

0

0

17

OBJ SCORES

location

693

626

554

0

32

107

40

18

80

88

15

6

5

24

SLOT SCORES

location

locale

697

626

511

0

75

111

40

24

73

82

16

6

13

31

locale_type

693

600

504

0

63

126

33

38

73

84

18

6

11

31

country

691

624

493

0

90

108

41

26

71

79

16

7

15

33

obj_status

0

0

0

0

0

0

0

23

0

0

0

0

0

0

comment

0

0

0

0

0

0

0

52

0

0

0

0

0

0

ALL SLOTS

2081

1851

1508

0

228

345

115

163

72

81

17

6

13

31

P&R

2P&R

P&2R

F-MEASURES

76.70

79.49

74.10

Table 7: Location Object Scores

 

 

WALKTHROUGH MESSAGE

 

Our overall score for the walkthrough message is slightly below our overall performance.

 

 

*

*

*

SUMMARY SCORES

*

*

*

POS

ACT

COR

PAR

INC

MIS

SPU

NON

REC

PRE

UND

OVG

SUB

ERR

SUBTASK SCORES

entity

artifact

3

9

3

0

0

0

6

0

100

33

0

67

0

67

organization

23

23

20

0

3

0

0

0

87

87

0

0

13

13

person

10

12

10

0

0

0

2

0

100

83

0

17

0

17

location

airport

0

0

0

0

0

0

0

0

0

0

0

0

0

0

city

9

8

8

0

0

1

0

0

89

100

11

0

0

11

country

6

6

6

0

0

0

0

0

100

100

0

0

0

0

province

1

1

1

0

0

0

0

2

100

100

0

0

0

0

region

3

2

2

0

0

1

0

0

67

100

33

0

0

33

unk

0

0

0

0

0

0

0

0

0

0

0

0

0

0

water

0

0

0

0

0

0

0

0

0

0

0

0

0

0

OBJ SCORES

location

19

17

17

0

0

2

0

0

89

100

11

0

0

11

entity

36

44

33

0

3

0

8

0

92

75

0

18

8

25

SLOT SCORES

location

locale

19

17

16

0

1

2

0

1

84

94

11

0

6

16

locale_type

19

17

17

0

0

2

0

2

89

100

11

0

0

11

country

19

16

15

0

1

3

0

0

79

94

16

0

6

21

entity

ent_name

41

46

28

0

7

6

11

0

68

61

15

24

20

46

ent_type

36

44

33

0

3

0

8

0

92

75

0

18

8

25

ent_descrip

19

25

12

0

4

3

9

17

63

48

16

36

25

57

ent_categor

36

44

29

0

7

0

8

0

81

66

0

18

19

34

ALL SLOTS

189

209

150

0

23

16

36

33

79

72

8

17

13

33

P&R

2P&R

P&2R

F-MEASURES

75.38

73.17

77.72

Table 8: Walkthrough Scores

Persons

AATM7 found all of the persons in the walkthrough document. Of the five person descriptors, it missed only two; it made a separate entity for one of the descriptors and found only part of the other. The other spurious person entity is really an organization ("ING Barings") that was mistaken for a person, due to the fact that Ing is in the firstnames list. AATM7 did confuse another organization ("Bloomberg Business") as a person because of the context ("the parent of"), but this was marked incorrect, instead of spurious, because it was mapped to the organization object in the keys.

Organizations

Of the twenty-three organization entities, AATM7 found twenty-one. It missed "International Technology Underwriters" and "Axa SA." Two other organizations were typed incorrectly as people, as has been mentioned. Five of the nine organization descriptors were found correctly. The remaining error in the organization area is the result of the breaking of the variation linking mechanism that has been mentioned.

Artifacts

AATM7 correctly identified all three of the artifacts in the walkthrough article; however, because it overgenerated, precision for this object is a low 33%. This was due to the previously discussed mistake in which an organization that owned the satellite was incorrectly identified as the name. In fact, the organizations "Intelsat" and "United States" account for five of the six spurious artifacts. Two of the three descriptors were identified correctly.

Locations

 

AATM7 correctly identified sixteen of the nineteen locations, but missed "Arlington," "China," and the "Central" part of "Central America." This was due to overzealous context-based filtering.

 

CONCLUSIONS

 

A cursory analysis of AATM’s MUC-7 scores revealed seven specific improvements to improve MUC-7 performance. Of these seven, five will be made in order to improve NLToolset performance. The sixth, adding extra-terrestrial bodies to the knowledge base, will be done to expand the NLToolset’s reach. The seventh, making nation adjectives into locations, will not be done until a real-world application requires it.

 

If one were to make all of the changes specified, AATM7’s overall scores would be improved to:

 

RECALL

PRECISION

F-MEASURE

83

78

80.42

The NLToolset continues to improve, as it is applied to new problems, whether real-world application or standardized test. Its accuracy remains high and its speed is constantly improving, currently standing, in its compiled state, at under twenty seconds for an average document.


For more information contact: Donna Harman
Last updated: Friday, 12-Jan-2001 13:09:33 EST
Date created: Friday, 12-Jan-01