Exponential language model

An exponential language model or maximum entropy language model use the following formula to express the conditional probability of word $w_i$ given context $h_i$ :

P(w_i|h_i) = \frac{1}{Z(h_i)} \exp\left(\sum_j \lambda_j f_j (h_i, w_i )\right)

,

where $\lambda_j$ are the parameters, $f_j (h_i , w_i )$ are arbitrary functions of the pair $(h_i , w_i )$ and $Z(h i )$ is a normalization factor:

{\displaystyle Z(h ) = \sum_{w \in V}\exp\left( \sum_j \lambda_j f_j (h, w)\right). }

The parameters are learned from the training data based on the Maximum Entropy principle. It was first introduced into language modeling by Pietra et al. (1992)^[1]. Later, it was systematically investigated by Rosenfeld (1996)^[2].

References

↑ Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, and Salim Roukos. Adaptive language modeling using minimum discriminant estimation. In Proceed- ings of the workshop on Speech and Natural Language, pages 103–106, 1992.
↑ Ronald Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10(3):187–228, 1996.

[1] Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, and Salim Roukos. Adaptive language modeling using minimum discriminant estimation. In Proceed- ings of the workshop on Speech and Natural Language, pages 103–106, 1992.

[2] Ronald Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10(3):187–228, 1996.

[1]

[2]

Exponential language model

References

Fan Feed