From Zygmunt Z. on FastML
With a random forest, in contrast, the first parameter to select is the number of trees. Easy: the more, the better. That’s because the multitude of trees serves to reduce variance. Each tree fits, or overfits, a part of the training set, and in the end their errors cancel out, at least partially. Random forests do overfit, just compare the error on train and validation sets.
Other parameters you may want to look at are those controlling how big a tree can grow. As mentioned above, averaging predictions from each tree counteracts overfitting, so usually one wants biggish trees.
One such parameter is min. samples per leaf. In scikit-learn’s RF, it’s value is one by default. Sometimes you can try increasing this value a little bit to get smaller trees and less overfitting. This CoverType benchmark overdoes it, going from 1 to 13 at once. Try 2 or 3 first.
Finally, there’s max. features to consider. Once upon a time, we tried tuning that param, to no avail. We suspect that it may have a better effect when dealing with sparse data - it would make sense to try increasing it then.