By Tal Galili
R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2013 by implementingdata.table::fread() in C. The examples at that time were 30 seconds down to 3 seconds to read 50MB and over 1 hour down to 8 minute to read 20GB. In 2015 Hadley Wickham implemented readr::read_csv() in C++.
But what about writing csv files?
It is often the case that the end goal of work in R is a report, a visualization or a summary of some form. Not anything large. So we don’t really need to export large data, right? It turns out that the combination of speed, expressiveness and advanced abilities of R (particularly data.table) mean that it can be faster (so I’m told) to export the data from other environments (e.g. a database), manipulate in R, and export back to the database, than it is keeping the data in the database. However, the data export step out of R being the biggest bottleneck preventing that workflow is being increasingly heard from practitioners in the field. The export step may be needed to send clean data or derived features to other teams or systems in the user’s organization.
Yes: feather, I hear you thinking! Indeed feather is a new fast uncompressed binary format which is also cross-language, by Wes McKinney and Hadley Wickham. As Roger Pengpointed out R has had fast XDR binary format for many years but the innovation of feather is that it is compatible with Python and Julia and other systems will …read more
Source:: r-bloggers.com