Design choices Edit
Chunk representation Edit
NER task is commonly viewed as a sequential prediction problem in which we aim at assigning the correct label for each token. Different ways of encoding information in a set of labels make different chunk representation. The two most popular schemes are BIO and BILOU.
BIO stands for Beginning, Inside and Outside (of a text segment). In a system that recognizes entity boundaries only, only three labels are used: B, I and O while in a NERC system, entity classes are encoded into beginning and inside labels. Below is an example of BIO scheme with entity classes.
Similar but more detailed than BIO, BILOU encode the Beginning, the Inside and Last token of multi-token chunks while differentiate them from Unit-length chunks. The same sentence is annotated differently in BILOU:
For some dataset and methods, BILOU has been found to outperform the more widely used BIO.
Non-local features Edit
Context aggregation Edit
Chieu and Ng gather contexts that a token appear in to support decision making. Features are defined manually, for example:
- the longest capitalized sequence of words in the document which contains the current token
- the token appears before a company marker such as ltd. elsewhere in text
Ratinov used context windows of size 2 and also achieved improved result.
Two-stage prediction aggregation Edit
Some instances of a token appear in easily-identifiable contexts. Krishnan and Manning used the result of a baseline system as features for a second system and observed a relative error reduction of 12-13%. A variation of Ratinov also improved accuracy from a baseline model.
Extended prediction history Edit
It is observed that names in the beginning of a document tend to be recognized more easily and match gazetteers more often. Ratinov used predicted labels of previous tokens as features. For example, if "Australia" was assigned "L-ORG" 2 times and "L-LOC" 3 times then the prediction history feature will be: (L-ORG: 2/5, L-LOC: 3/5).
External knowledge Edit
Unlabeled text Edit
External references Edit
- Doing Named Entity Recognition? Don't optimize for F1 blog post by Christopher Manning
- TutorialNamedEntityChunkingClassifier - cleartk
- ↑ 1.0 1.1 1.2 1.3 Ratinov, L., & Roth, D. (2009, June). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147-155). Association for Computational Linguistics.
- ↑ Chieu, H. L., & Ng, H. T. (2002, August). Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
- ↑ Krishnan, V., & Manning, C. D. (2006, July). An effective two-stage model for exploiting non-local dependencies in named entity recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 1121-1128). Association for Computational Linguistics.