Design choices[]
Chunk representation[]
NER task is commonly viewed as a sequential prediction problem in which we aim at assigning the correct label for each token. Different ways of encoding information in a set of labels make different chunk representation. The two most popular schemes are BIO and BILOU.[1]
BIO[]
BIO stands for Beginning, Inside and Outside (of a text segment). In a system that recognizes entity boundaries only, only three labels are used: B, I ,and O while in a NERC system, entity classes are encoded into the beginning and inside labels. Below is an example of a BIO scheme with entity classes.
Minjun | B-Person |
is | O |
from | O |
South | B-Location |
Korea | I-Location |
. | O |
BILOU[]
Similar but more detailed than BIO, BILOU encode the Beginning, the Inside and Last token of multi-token chunks while differentiate them from Unit-length chunks. The same sentence is annotated differently in BILOU:
Minjun | U-Person |
is | O |
from | O |
South | B-Location |
Korea | L-Location |
. | O |
For some dataset and methods, BILOU has been found to outperform the more widely used BIO.[1]
Non-local features[]
Context aggregation[]
Chieu and Ng gather contexts that a token appear in to support decision making.[2] Features are defined manually, for example:
- the longest capitalized sequence of words in the document which contains the current token
- the token appears before a company marker such as ltd. elsewhere in text
Ratinov used context windows of size 2 and also achieved improved result.[1]
Two-stage prediction aggregation[]
Some instances of a token appear in easily-identifiable contexts. Krishnan and Manning used the result of a baseline system as features for a second system and observed a relative error reduction of 12-13%.[3] A variation of Ratinov also improved accuracy from a baseline model.[1]
Extended prediction history[]
It is observed that names in the beginning of a document tend to be recognized more easily and match gazetteers more often. Ratinov used predicted labels of previous tokens as features. For example, if "Australia" was assigned "L-ORG" 2 times and "L-LOC" 3 times then the prediction history feature will be: (L-ORG: 2/5, L-LOC: 3/5).
External knowledge[]
Gazetteers[]
Unlabeled text[]
External references[]
- Doing Named Entity Recognition? Don't optimize for F1 blog post by Christopher Manning
- TutorialNamedEntityChunkingClassifier - cleartk
References[]
- ↑ 1.0 1.1 1.2 1.3 Ratinov, L., & Roth, D. (2009, June). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147-155). Association for Computational Linguistics.
- ↑ Chieu, H. L., & Ng, H. T. (2002, August). Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
- ↑ Krishnan, V., & Manning, C. D. (2006, July). An effective two-stage model for exploiting non-local dependencies in named entity recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 1121-1128). Association for Computational Linguistics.