I was recently browsing stackoverflow.com (often called SO) for the most voted questions under R tag.
To my surprise, many questions on the first page were quite well addressed with the data.table package. I found a few other questions that could benefit from a data.table answer, therefore went ahead and answered them.
In this post, I’d like to summarise them along with benchmarks (where possible) and my comments if any.
Many answers under highly voted questions seem to have been posted a while back. data.table is quite actively developed and has had tons of improvements (in terms of speed and memory usage) over the recent years. It might therefore be entirely possible that some of those answers will have even better performance by now.
50 highest voted questions under R tag
Here’s the list of top 50 questions. I’ve marked those for which a data.table answer is available (which is usually quite performant).
I | Number of votes | Question title | Use data.table solution |
---|---|---|---|
1 | 1153 | How to make a great R reproducible example? | |
2 | 621 | How to sort a dataframe by column(s)? | TRUE |
3 | 496 | R Grouping functions: sapply vs. lapply vs. apply. vs. tappl | TRUE |
4 | 429 | How can we make xkcd style graphs? | |
5 | 396 | How to join (merge) data frames (inner, outer, left, right)? | TRUE |
6 | 330 | What statistics should a programmer (or computer scientist) | |
7 | 314 | Drop columns in R data frame | TRUE |
8 | 290 | Tricks to manage the available memory in an R session | |
9 | 280 | Remove rows with NAs in data.frame | TRUE |
10 | 279 | Quickly reading very large tables as dataframes in R | TRUE |
11 | 263 | How to properly document S4 class slots using Roxygen2? | |
12 | 250 | Assignment operators in R: ‘=’ and ‘ | |
13 | 236 | Drop factor levels in a subsetted data frame | TRUE |
14 | 234 | Plot two graphs in same plot in R | |
15 | 225 | What is the difference between require() and library()? | |
16 | 221 | data.table vs dplyr: can one do something well the other can | |
17 | 216 | In R, why is [ better than subset ? |
|
18 | 212 | R …read more
Source:: r-bloggers.com |