Natural Language Understanding Wiki

The simplest approach in NLP is looking for words. Because of its simplicity and small resource requirement, it has always been very common, for example best search engines are word-based.

Some projects employing word-based approach, according to Cambria & White (2014)[1]:

  1. Ortony’s Affective Lexicon (Ortony, Clore, & Collins, 1988), which groups words into affective categories
  2. Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1994), a corpus consisting of over 4.5 million words of American English annotated for part-of-speech (POS) information
  3. PageRank (Page, Brin, Motwani, & Winograd, 1999), the famous ranking algorithm of Google
  4. LexRank (GÜnes & Radev, 2004), a stochastic graph-based method for computing relative importance of textual units for NLP
  5. TextRank (Mihalcea & Tarau, 2004), a graph-based ranking model for text processing, based on two unsupervised methods for keyword and sentence extraction


  • Reliance on surface features: a document about dogs may not use the word "dog" because specific bread names are used.


  1. Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational Intelligence Magazine, 9(2), 48-57.