Natural Language Understanding Wiki

An exponential language model or maximum entropy language model use the following formula to express the conditional probability of word given context :


where are the parameters, are arbitrary functions of the pair and is a normalization factor:

The parameters are learned from the training data based on the Maximum Entropy principle. It was first introduced into language modeling by Pietra et al. (1992)[1]. Later, it was systematically investigated by Rosenfeld (1996)[2].


  1. Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, and Salim Roukos. Adaptive language modeling using minimum discriminant estimation. In Proceed- ings of the workshop on Speech and Natural Language, pages 103–106, 1992.
  2. Ronald Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10(3):187–228, 1996.