If you’ve read my blog, taken one of my classes, or sat next to me on an airplane, you probably know I’m a big fan of Hadley Wickham’s ggplot2 package, especially compared to base R plotting.
Not everyone agrees. Among the anti-ggplot2 crowd is JHU Professor Jeff Leek, who yesterday wrote up his thoughts on the Simply Statistics blog:
…one place I lose tons of street cred in the data science community is when I talk about ggplot2… ggplot2 is an R package/phenomenon for data visualization. It was created by Hadley Wickham, who is (in my opinion) perhaps the most important statistician/data scientist on the planet. It is one of the best maintained, most important, and really well done R packages. Hadley also supports R software like few other people on the planet.
But I don’t use ggplot2 and I get nervous when other people do.
Jeff is a great statistician, an excellent and experienced educator, and among my favorite scientific communicators. He and I agree strongly on a wide variety number of topics, ranging from peer review to p-values.
In short, I’ve learned a lot from him. So I appreciate the chance to return the favor. I’m going to try crossing this one last disagreement off the list.
Where base plotting is better
I’ll start by giving credit: there are plenty of cases that base plotting tools are superior. The tools were developed over many years by very smart people. While some methods haven’t stood the test of time, others have.
As one example (which Jeff brings up in his post), take clustered heatmaps. Heatmaps are in fact easy to make in ggplot2 with geom_tile
or geom_raster
, but not with row- and column-clustering built-in, which is essential in applications such as genomics. You’ll see that I use a base-plotting heatmap in my “Love Actually” post, …read more
Source:: r-bloggers.com