Trade-offs to consider when reading a large dataset into R using the RevoScaleR package

by Seth Mottaghinejad, Data Scientist at Microsoft

R and big data

There are many R packages dedicated to letting users (or useRs if you prefer) deal with big data in R. (We will intentionally avoid using proper case for ‘big data’, because (1) the term has been somewhat hackneyed, and (2) for the sake of this article we can think of big data as any dataset too large to fit into memory as a data.frame so that standard R functions can run on them.) Even without third party packages, base R still puts some toolkits at our disposal, which boil down to doing one of two things: We can either format the data more economically so that it can still be squeeze into memory, or we can deal with the data piecemeal, bypassing the need to load it into memory all at once.

An example of the first approach is to format character vectors as factor, when doing so is appropriate, because factor is stored as integer under the hood which takes less space than the strings it represents. There are of course many other advantages to using factor, but let’s not digress. An example of the second approach consists of processing the data only a certain number of rows at a time, i.e. chunk by chunk, where each chunk can fit into memory and brought into R as a data.frame.

RevoScaleR and big data

The aforementioned chunk-wise processing of data is what the RevoScaleR package does behind the scenes. For example, if we run the rxLinMod function (the counterpart to base R’s lm function), we can run a regression model on a very large dataset, presumably is too large to fit into memory. Even if the dataset could still fit into memory (servers nowadays can have easily have 500GB of RAM), processing it using …read more

Source:: r-bloggers.com

Trade-offs to consider when reading a large dataset into R using the RevoScaleR package

R and big data

RevoScaleR and big data

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...