Natural Language Understanding Wiki
Advertisement

Algorithm[]

From Antoine (on Cross Validated):

  • Boosting is based on weak learners (high bias, low variance). In terms of decision trees, weak learners are shallow trees, sometimes even as small as decision stumps (trees with two leaves). Boosting reduces error mainly by reducing bias (and also to some extent variance, by aggregating the output from many models).
  • On the other hand, Random Forest uses as you said fully grown decision trees (low bias, high variance). It tackles the error reduction task in the opposite way: by reducing variance. The trees are made uncorrelated to maximize the decrease in variance, but the algorithm cannot reduce bias (which is slightly higher than the bias of an individual tree in the forest). Hence the need for large, unpruned trees, so that the bias is initially as low as possible.

Please note that unlike Boosting (which is sequential), RF grows trees in parallel.

Performance[]

From Gerome Pistre (on Quora):

"the consensus is generally that xgboost has the better performance (with similar speed). You can search the results of Kaggle’s past competitions on their blog to see that it is part of almost all of the winning’s solutions.

...

gradient boosting requires much more care in setting up. Whereas it is perfectly possible to “blindly” apply RF and end up with decent performance, with very little chance to overfit, it is pretty much meaningless to train xgboost without cross-validation. You will need to fine-tune the maximum depth of the trees and the shrinking factor (eta in R & Python I think)"

From Zygmunt Z. (on FastML):

In 2005, Caruana et al. made an empirical comparison of supervised learning algorithms [video]. They included random forests and boosted decision trees and concluded that

With excellent performance on all eight metrics, calibrated boosted trees were the best learning algorithm overall. Random forests are close second.

Let’s note two things here. First, they mention calibrated boosted trees, meaning that for probabilistic classification trees needed calibration to be the best. Second, it’s unclear what boosting method the authors used.

In the follow-up study concerning supervised learning in high dimensions the results are similar:

Although there is substantial variability in performance across problems and metrics in our experiments, we can discern several interesting results. First, the results confirm the experiments in (Caruana & Niculescu-Mizil, 2006) where boosted decision trees perform exceptionally well when dimensionality is low. In this study boosted trees are the method of choice for up to about 4000 dimensions. Above that, random forests have the best overall performance.

Advertisement