Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

matrixStats: Optimized subsetted matrix calculations

$
0
0

By Henrik Bengtsson

(This article was first published on jottR, and kindly contributed to R-bloggers)

The matrixStats package provides highly optimized functions for computing common summaries over rows and columns of matrices. In a previous blog post, I showed that, instead of using apply(X, MARGIN=2, FUN=median), we can speed up calculations dramatically by using colMedians(X). In the most recent release (version 0.50.0), matrixStats has been extended to perform optimized calculations also on a subset of rows and/or columns specified via new arguments rows and cols, e.g. colMedians(X, cols=1:50).

It’s time to leave apply() behind.

For instance, assume we wish to find the median value of the first 50 columns of matrix X with 1,000,000 rows and 100 columns. For simplicity, assume

> X 

To get the median values without matrixStats, we would do

> y > str(y)
num [1:50] -0.001059 0.00059 0.001316 0.00103 0.000814 ...

As in the past, we could use matrixStats to do

> y 

which is much faster than apply() with median().

However, both approaches require that X is subsetted before the actual calculations can be performed, i.e. the temporary object X[,1:50] is created. In this example, the size of the original matrix is ~760 MiB and the subsetted one is ~380 MiB;

> object.size(X)
800000200 bytes
> object.size(X[,1:50])
400000100 bytes

This temporary object is created by (i) R first allocating the size for it and then (ii) copying all its values over from X. After the medians have been calculated this temporary object is automatically discarded and eventually (iii) R’s garbage collector will deallocate its memory. This introduces overhead in form of extra memory usage as well as processing time.

Starting with matrixStats 0.50.0, we can avoid this overhead by instead using

> y 

This uses less memory, because no internal copy of X[,1:50] …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles