By Andrea Spanò
We use summarise()
with aggregate functions, which take a vector of values and return a single number. Function summarise_each()
offers an alternative approach to summarise()
with identical results.
This post aims to compare the behavior of summarise()
and summarise_each()
considering two factors we can take under control:
- How many variables to manipulate
- 1A. single variable
- 1B. more than a variable
- How many functions to apply to each variable
- 2A. single function
- 2B. more than one function
resulting in the following four cases:
- Case 1: apply one function to one variable
- Case 2: apply many functions to one variable
- Case 3: apply one function to many variables
- Case 4: apply many functions to many variables
These four cases will be also tested with and without a group_by()
option.
The mtcars
data frame
For this article we will use the well known mtcars
data frame.
We will first transform it into a tbl_df
object; no change will occur to the standard data.frame
object but a much better print method will be available.
Finally, to keep this article tidy and clean we will select only four variables of interest
mtcars % tbl_df() %>% select(cyl , mpg, disp)
Case 1: apply one function to one variable
In this case, summarise()
results the simplest candidate.
# without group mtcars %>% summarise (mean_mpg = mean(mpg))
## Source: local data frame [1 x 1] ## ## mean_mpg ## (dbl) ## 1 20.09062
# with group mtcars %>% group_by(cyl) %>% summarise (mean_mpg = mean(mpg))
## Source: local data frame [3 x 2] ## ## cyl mean_mpg ## (dbl) (dbl) ## 1 4 26.66364 ## 2 6 19.74286 ## 3 8 15.10000
We could use function summarise_each()
as well but, its usage results in a loss of clarity.
# without group
mtcars %>% ...read moreSource:: r-bloggers.com