## FANDOM

303 Pages

Main reference: Chang et al. (2008)[1] from Dan Roth's group.

CCM is an NLP modeling paradigm in which the objective functions are expressed as the linear combination of feature functions and constraints:

$f(x, y) = \sum_i w_i \phi_i (x,y) - \sum_j \rho_j C_j(x,y)$

At test time, an annotation is obtained by maximizing the objective function:

$y^* = \operatorname*{argmax}_{y \in \mathcal{Y}} f(x,y)$

Notice that the inference is global in the sense that when the output contains a lot of components (say, POS tags of a sentence, arguments of a predicate), they are chosen to jointly optimize a function. In other words, they are decided simultaneously. This is different from, e.g., transition systems which assign one piece of the output at a time.

CCM can encode dependency between output parts but not so complicated (otherwise it becomes intractable). For example, Rizzolo and Roth (2010)[2] show how to encode Hidden Markov models in this framework.

CCM Transition-based
Inference + no error propagation - suffer from error propagation
- harder to implement, but there are general-purpose solvers (e.g. ILP solvers) + easier to implement
- slower + faster (for greedy decoding, but also beam search)
Features - restricted set of features (otherwise the model becomes intractable) + rich set of features
Examples anything that uses LBJ (Rizzolo and Roth, 2010)[2] transition-based dependency parsing, e.g. MALT parser (Nivre et al. 2006[3])

## References Edit

1. M. Chang, L. Ratinov, N. Rizzolo, and D. Roth. 2008. Learning and Inference with Constraints. In Proc. of AAAI.
2. 2.0 2.1 Rizzolo, N., & Roth, D. (2010). Learning Based Java for Rapid Development of NLP Systems. Proceedings of the Language Resources and Evaluation Conference, 957–964. Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/747_Paper.pdf
3. Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser: A data-driven parser-generator for dependency parsing. In LREC 2006 (Vol. 6, pp. 2216–2219).