Natural Language Understanding Wiki
Register
Advertisement


Design choices[]

Chunk representation[]

NER task is commonly viewed as a sequential prediction problem in which we aim at assigning the correct label for each token. Different ways of encoding information in a set of labels make different chunk representation. The two most popular schemes are BIO and BILOU.[1]

BIO[]

BIO stands for Beginning, Inside and Outside (of a text segment). In a system that recognizes entity boundaries only, only three labels are used: B, I ,and O while in a NERC system, entity classes are encoded into the beginning and inside labels. Below is an example of a BIO scheme with entity classes.

Minjun B-Person
is O
from O
South B-Location
Korea I-Location
. O

BILOU[]

Similar but more detailed than BIO, BILOU encode the Beginning, the Inside and Last token of multi-token chunks while differentiate them from Unit-length chunks. The same sentence is annotated differently in BILOU:

Minjun U-Person
is O
from O
South B-Location
Korea L-Location
. O

For some dataset and methods, BILOU has been found to outperform the more widely used BIO.[1]

Non-local features[]

Context aggregation[]

Chieu and Ng gather contexts that a token appear in to support decision making.[2] Features are defined manually, for example:

  • the longest capitalized sequence of words in the document which contains the current token
  • the token appears before a company marker such as ltd. elsewhere in text

Ratinov used context windows of size 2 and also achieved improved result.[1]

Two-stage prediction aggregation[]

Some instances of a token appear in easily-identifiable contexts. Krishnan and Manning used the result of a baseline system as features for a second system and observed a relative error reduction of 12-13%.[3] A variation of Ratinov also improved accuracy from a baseline model.[1]

Extended prediction history[]

It is observed that names in the beginning of a document tend to be recognized more easily and match gazetteers more often. Ratinov used predicted labels of previous tokens as features. For example, if "Australia" was assigned "L-ORG" 2 times and "L-LOC" 3 times then the prediction history feature will be: (L-ORG: 2/5, L-LOC: 3/5).

External knowledge[]

Gazetteers[]

Unlabeled text[]

External references[]

References[]

  1. 1.0 1.1 1.2 1.3 Ratinov, L., & Roth, D. (2009, June). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147-155). Association for Computational Linguistics.
  2. Chieu, H. L., & Ng, H. T. (2002, August). Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
  3. Krishnan, V., & Manning, C. D. (2006, July). An effective two-stage model for exploiting non-local dependencies in named entity recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 1121-1128). Association for Computational Linguistics.
Advertisement