Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Confidence Intervals for Random Forests

$
0
0

By Joseph Rickert

by Joseph Rickert

Random Forests, the “go to” classifier for many data scientists, is a fairly complex algorithm with many moving parts that introduces randomness at different levels. Understanding exactly how the algorithm operates requires some work, and assessing how good a Random Forests model fits the data is a serious challenge. In the pragmatic world of machine learning and data science, assessing model performance often comes down to calculating the area under the ROC curve (or some other convenient measure) on a hold out set of test data. If the ROC looks good then the model is good to go.

Fortunately, however, goodness of fit issues have a kind of nagging persistence that just won’t leave statisticians alone. In a gem of a paper (and here) that sparkles with insight, the authors (Wagner, Hastie and Efron) take considerable care to make things clear to the reader while showing how to calculate confidence intervals for Random Forests models.

Using the high ground approach favored by theorists, Wagner et al. achieve the result about Random Forests by solving a more general problem first: they derive estimates of the variance of bagged predictors that can be computed from the same bootstrap replicates that give the predictors. After pointing out that these estimators suffer from two distinct sources of noise:

  1. Sampling noise – noise resulting from the randomness of data collection
  2. Monte Carlo noise – noise that results from using a finite number of bootstrap replicates

they produce bias corrected versions of jackknife and infinitesimal jackknife estimators. A very nice feature of the paper is the way the authors’ illustrate the theory with simulation experiments and then describe the simulations in enough detail in an appendix for readers to replicate the results. I generated the following code and figure to replicate Figure 1 of their the first …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles