Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

How long could it take to run a regression

$
0
0

By arthur charpentier

This afternoon, while I was discussing with Montserrat (aka @mguillen_estany) we were wondering how long it might take to run a regression model. More specifically, how long it might take if we use a Bayesian approach. My guess was that the time should probably be linear in , the number of observations. But I thought I would be good to check.

Let us generate a big dataset, with one million rows,

> n=1e6
> X=runif(n)
> Y=2+5*X+rnorm(n)
> B=data.frame(X,Y)

Consider as a benchmark the standard linear regression,

> lm_freq = function(n){
+   idx = sample(1:1e6,size=n)
+   reg = lm(Y~X,data=B[idx,])
+   summary(reg)
+ }

Here the regression is a subset of smaller size. We can do the same with a Bayesian approach, using stan,

> stan_lm ="
+ data {
+ int N;
+ vector[N] x;
+ vector[N] y;
+ }
+ parameters {
+ real alpha;
+ real beta;
+ real tau;
+ }
+ transformed parameters {
+ real sigma;
+ sigma + }
+ model{
+ y ~ normal(alpha + beta * x, sigma);
+ alpha ~ normal(0, 10);
+ beta ~ normal(0, 10);
+ tau ~ gamma(0.001, 0.001);
+ }
+ "

Define then the model

> library(rstan)
> system.time( 
  stanmodel utilisateur     système      écoulé 
      0.043       0.000       0.043

We want to see how long it might take to run a regression,

> lm_bayes = function(n){
+   idx = sample(1:1e6,size=n)
+   fit = sampling(stanmodel,
+       data = list(N=n,
+                   x=X[idx],
+                   y=Y[idx]),
+       iter = 1000, warmup=200)
+   summary(fit)
+ }

We use the following package to …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles