Yesterday a family member forwarded me a Wall Street Journal interview titled What Data Scientists Do All Day At Work. The title intrigued me immediately, partly because I find myself explaining that same topic somewhat regularly.
I wasn’t disappointed in the interview: General Electric’s Dr. Narasimhan gave insightful and well-communicated answers, and I both recognized familiar opinions and learned new perspectives. But I was disappointed that in an article about data scientists (!) they would include a chart this terrible:
Pie charts have a bad reputation among statisticians and data scientists, with good reason (see here for more). But this is an especially unfortunate example. We’re meant to compare and contrast these six tasks. But at a glance, do you have any idea whether more time is spent “Presenting Analysis” or “Data cleaning”?
The problem with a lot of pie-chart bashing (and most “chart-shaming,” in fact) is that people don’t follow up with a better alternative. So here I’ll show how I would have created a different graph (using R and ggplot2) to communicate the same information. This also serves as an example of the thought process I go through in creating a data visualization.
(I’d note that this post is appropriate for Pi Day, but I’m more of a Tau Day observer anyway).
Setup
I start by transcribing the data directly from the plot into R. readr::read_csv
is useful for constructing a table on the fly:
library(readr)
d read_csv("Task,4 a day
Basic exploratory data analysis,11,32,46,12
Data cleaning,19,42,31,7
Machine learning/statistics,34,29,27,10
Creating visualizations,23,41,29,7
Presenting analysis,27,47,20,6
Extract/transform/load,43,32,20,5")
# reorganize
library(tidyr)
d gather(d, Hours, Percentage, -Task)
This constructs our data in the form:
Task | Hours | Percentage |
---|---|---|
Basic exploratory data analysis | 11 | |
Data cleaning | 19 | |
Machine learning/statistics | 34 | |
Creating visualizations | 23 | |
Presenting analysis | 27 | |
Extract/transform/load | 43 | |
Basic exploratory data analysis | 1-4 a week | 32 |
Data cleaning | 1-4 a week | 42 |
Machine learning/statistics | 1-4 a week | 29 |
Bar plot
The most common way a pie chart can be improved is by turning it into a bar chart, with categories on …read more
Source:: r-bloggers.com