by Edward Ma and Vishrut Gupta (Hewlett Packard Enterprise)
A few weeks ago, we revealed ddR (Distributed Data-structures in R), an exciting new project started by R-Core, Hewlett Packard Enterprise, and others that provides a fresh new set of computational primitives for distributed and parallel computing in R. The package sets the seed for what may become a standardized and easy way to write parallel algorithms in R, regardless of the computational engine of choice.
In designing ddR, we wanted to keep things simple and familiar. We expose only a small number of new user functions that are very close in semantics and API to their R counterparts. You can read the introductory material about the package here. In this post, we show how to use ddR functions.
Classes dlist, darray, and dframe: These classes are the distributed equivalents of list, matrix, and data.frame, respectively. Keeping their APIs similar to those for the vanilla R classes, we implemented operators and functions that work on these functions in the same ways. The example below creates two distributed lists — one of five 3s and one out of the elements 1 through 5.
a
b
The argument nparts specifies the number of partitions to split the resulting dlist b into. For darrays and dframes, which are two-dimensional, nparts also permits a two-element vector, which specifies the two-dimensional partitioning of the output.
Functions dmapply and dlapply: Following R’s functional-programming paradigm, we have created these two functions as the distributed equivalents of R’s mapply and lapply. One can supply any combination of distributed objects and regular R args into dmapply:
addThenSubtract function(x,y,z) { x + y – z} …read more
Source:: r-bloggers.com