Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

When k-means Clustering Fails

$
0
0

By Jonathan Callahan

clusters

Letting the computer automatically find groupings in data is incredibly powerful and is at the heart of “data mining” and “machine learning”. One of the most widely used methods for clustering data is k-means clustering. Unfortunately, k-means clustering can fail spectacularly as in the example below.

Centroid-based clustering algorithms work on multi-dimensional data by partitioning data points into k clusters such that the sum of squares from points to the assigned cluster centers is minimized. In simple terms, clusters contain all of the data points that are “close” to the cluster center.

Many of the simple examples on line demonstrate what happens when a clustering algorithm partitions data into roughly equal groups. But what happens when we know ahead of time that the groups we are looking for have very different sizes?

In our case, we are working with pollution monitoring data. Each file is associated with a particular monitor and each record within the file contains the latitude and longitude associated with an hourly measurement made by the monitor. These files give one insight into the life of one of these monitors:

  1. turned on in the lab for a few hours for testing
  2. turned on in the parking lot across from the lab for a day of outside testing
  3. field tested a few miles away
  4. field tested again after repositioning
  5. shipped across the country to be deployed
  6. deployed at site A
  7. moved to site B
  8. repositioned at site B

So we have a few records in one or two locations very close to each other. Then a few more records at a couple of sites nearby. Then some longer time series at one or more deployment sites with minor repositioning at the sites. Our goal is to cluster the latitude-longitude paris into groups that differentiate between deployments that are hundreds of miles apart but that group together small movements associated with …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles