By David Kun
This post gives a short review of the aggregate function as used for data.frames and presents some interesting uses: from the trivial but handy to the most complicated problems I have solved with aggregate.
Aggregate (data.frame): Technical Overview
Aggregate
is a function in base R which can, as the name suggests, aggregate the inputted data.frame
d.f by applying a function specified by the FUN
parameter to each column of sub-data.frames defined by the by
input parameter.
The by
parameter has to be a list
. However, since data.frame
‘s are handled as (named) lists of columns, one or more columns of a data.frame
can also be passed as the by
parameter. Interestingly, if these columns are of the same data.frame
as the one inputted as x
, those columns are not passed on to the FUN
function.
The function to apply has to be able to accept a vector
(since it will be called with parts of a column of a data.frame
as input).
The sub-data.frames defined by the by
input parameter can be thought of as logical indexing:
d.fand do this for every
i
between 1 andlength(unique(by))
. Note that theby
variable doesn't have to agree with one (or more) column of thedata.frame
but could be anything. Hence, one can reproduce theaggregate
functionality by afor
cycle running the cycle variable over the unique values of the variable passed asby
and ansapply
applying the function passed asFUN
to each column of thedata.frame
sub.data.frame. Such a workaround however would be very difficult to document, as it would be unclear what (and why) this code is actually doing.
Aggregate
always returns adata.frame
as a result. Thisdata.frame
will contain the (now unique) values from the input parameterby
as the first column and then columns containing the results of the call to ...read moreSource:: r-bloggers.com