In a previous post we saw how to perform bayesian regression in R using STAN for normally distributed data. In this post we will look at how to fit non-normal model in STAN using three example distributions commonly found in empirical data: negative-binomial (overdispersed poisson data), gamma (right-skewed continuous data) and beta-binomial (overdispersed binomial data).
The STAN code for the different models is at the end of this posts together with some explanations.
Negative Binomial
The Poisson distribution is a common choice to model count data, it assumes that the variance is equal to the mean. When the variance is larger than the mean, the data are said to be overdispersed and the Negative Binomial distribution can be used. Say we have measured a response variable y
that follow a negative binomial distribution and depends on a set of k explanatory variables X
, in equation this gives us:
$$ y_{i} sim NB(mu_{i},phi) $$ $$ E(y_{i}) = mu_{i} $$ $$ Var(y_{i}) = mu_{i} + mu_{i}^{2} / phi $$ $$ log(mu_{i}) = beta_{0} + beta_{1} * X1_{i} + … + beta_{k} * Xk_{i} $$
The negative binomial distribution has two parameters: (mu) is the expected value that need to be positive, therefore a log link function can be used to map the linear predictor (the explanatory variables times the regression parameters) to (mu) (see the 4th equation); and (phi) is the overdispersion parameter, a small value means a large deviation from a Poisson distribution, while as (phi) gets larger the negative binomial looks more and more like a Poisson distribution.
Let’s simulate some data and fit a STAN model to them:
#load the libraries
library(arm) #for the invlogit function
library(emdbook) #for the rbetabinom function
library(rstan)
library(rstanarm) #for the launch_shinystan function#simulate some negative binomial data
#the explanatory variables
NShinystan:
The last command should open a window in your browser with loads of options to diagnose, estimate and ...read more
Source:: r-bloggers.com