By John Mount
In R analyses are typically organized around a data structure called a data.frame
. data.frame
s are similar to database tables and provide a number of invariants (subject to the usual exceptions you always find in R) in that:
- Data frames are a two dimensional array of cells, addressed by row (representing intances) and columns (representing measurements).
- Cells are addressable by rows indices and column indices/names.
- Every cell in a column has the same type.
- Every column has the same number of rows.
This is a very powerful basis for analysis and software. It is fairly central to R
and to teaching R
.
As you work with more data you may want to use some higher performance and higher power extensions of data.frame
. In particular the packages data.table and dplyr are two powerful high quality possibilities.
However, we do not certify Win-Vector LLC packages ( vtreat, WVPlots ) to safely work with data.table
. This is due to a design choice in the otherwise excellent data.table
package that we are not comfortable attempting working around. Obviously data.table
is a much more influential package than any of ours, but I think the issue is still worth discussing. Our suggested work-around is to convert any data.table
you are working with to a native data.frame
with an as.data.frame()
call on the way into vtreat
or WVPlots
. Or, as I am sure some readers have already decided, you can just avoid vtreat
and WVPlots
.
data.frame
data.frame
supplies a number of ways to access data cells. The common ones are illustrated below.
Consider the following simple data.frame
:
d <- data.frame(x=1:3,y=5:7)
We can get the contents of the column x
through a number of different notations:
d[,'x']
d[,1]
d[,c(TRUE,FALSE)]
d$x
d[['x']]
Now some of these should have the additional argument drop=FALSE
set (which …read more
Source:: win-vector.com