Main reference: Chang et al. (2008) from Dan Roth's group.
CCM is an NLP modeling paradigm in which the objective functions are expressed as the linear combination of feature functions and constraints:
At test time, an annotation is obtained by maximizing the objective function:
Notice that the inference is global in the sense that when the output contains a lot of components (say, POS tags of a sentence, arguments of a predicate), they are chosen to jointly optimize a function. In other words, they are decided simultaneously. This is different from, e.g., transition systems which assign one piece of the output at a time.
CCM can encode dependency between output parts but not so complicated (otherwise it becomes intractable). For example, Rizzolo and Roth (2010) show how to encode Hidden Markov models in this framework.
|Inference||+ no error propagation||- suffer from error propagation|
|- harder to implement, but there are general-purpose solvers (e.g. ILP solvers)||+ easier to implement|
|- slower||+ faster (for greedy decoding, but also beam search)|
|Features||- restricted set of features (otherwise the model becomes intractable)||+ rich set of features|
|Examples||anything that uses LBJ (Rizzolo and Roth, 2010)||transition-based dependency parsing, e.g. MALT parser (Nivre et al. 2006)|
- M. Chang, L. Ratinov, N. Rizzolo, and D. Roth. 2008. Learning and Inference with Constraints. In Proc. of AAAI.
- Rizzolo, N., & Roth, D. (2010). Learning Based Java for Rapid Development of NLP Systems. Proceedings of the Language Resources and Evaluation Conference, 957–964. Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/747_Paper.pdf
- Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser: A data-driven parser-generator for dependency parsing. In LREC 2006 (Vol. 6, pp. 2216–2219).