Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Hierarchical Clustering in R

$
0
0

By Teja Kodali

hclust

Hello everyone! In this post, I will show you how to do hierarchical clustering in R. We will use the iris dataset again, like we did for K means clustering.

What is hierarchical clustering?

If you recall from the post about k means clustering, it requires us to specify the number of clusters, and finding the optimal number of clusters can often be hard. Hierarchical clustering is an alternative approach which builds a hierarchy from the bottom-up, and doesn’t require us to specify the number of clusters beforehand.

The algorithm works as follows:

  • Put each data point in its own cluster.
  • Identify the closest two clusters and combine them into one cluster.
  • Repeat the above step till all the data points are in a single cluster.

Once this is done, it is usually represented by a dendrogram like structure.

There are a few ways to determine how close two clusters are:

  • Complete linkage clustering: Find the maximum possible distance between points belonging to two different clusters.
  • Single linkage clustering: Find the minimum possible distance between points belonging to two different clusters.
  • Mean linkage clustering: Find all possible pairwise distances for points belonging to two different clusters and then calculate the average.
  • Centroid linkage clustering: Find the centroid of each cluster and calculate the distance between centroids of two clusters.

Complete linkage and mean linkage clustering are the ones used most often.

Clustering

In my post on K Means Clustering, we saw that there were 3 different species of flowers.

Let us see how well the hierarchical clustering algorithm can do. We can use hclust for this. hclust requires us to provide the data in the form of a distance matrix. We can do this by using dist. By default, the complete linkage method is used.

clusters 

which generates the following dendrogram:

We can see from the figure that the best choices for total number of clusters are either ...read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles