Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Finding the K in K-means by Parametric Bootstrap

$
0
0

By Nina Zumel

NewImage

One of the trickier tasks in clustering is determining the appropriate number of clusters. Domain-specific knowledge is always best, when you have it, but there are a number of heuristics for getting at the likely number of clusters in your data. We cover a few of them in Chapter 8 (available as a free sample chapter) of our book Practical Data Science with R.

We also came upon another cool approach, in the mixtools package for mixture model analysis. As with clustering, if you want to fit a mixture model (say, a mixture of gaussians) to your data, it helps to know how many components are in your mixture. The boot.comp function estimates the number of components (let’s call it k) by incrementally testing the hypothesis that there are k+1 components against the null hypothesis that there are k components, via parametric bootstrap.

You can use a similar idea to estimate the number of clusters in a clustering problem, if you make a few assumptions about the shape of the clusters. This approach is only heuristic, and more ad-hoc in the clustering situation than it is in mixture modeling. Still, it’s another approach to add to your toolkit, and estimating the number of clusters via a variety of different heuristics isn’t a bad idea.

The Idea

Suppose this is our data:

In two dimensions, it’s pretty easy to see how many clusters to try for, but in higher dimensions this gets more difficult. Let’s set as our null hypothesis that this data is broken into two clusters.

NewImage

We can now estimate the mean and covariance matrices of these two clusters, for instance by using principle components analysis. If we assume that the clusters were generated by gaussian processes with the observed means and covariance matrices, then we can generate synthetic data …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles