By Joseph Rickert
The ability to generate synthetic data with a specified correlation structure is essential to modeling work. As you might expect, R’s toolbox of packages and functions for generating and visualizing data from multivariate distributions is impressive. The basic function for generating multivariate normal data is mvrnorm() from the MASS package included in base R, although the mvtnorm package also provides functions for simulating both multivariate normal and t distributions. (For tutorial on how to use R to simulate from multivariate normal distributions from first principles using some linear algebra and the Cholesky decomposition see the astrostatistics tutorial on Multivariate Computations.)
The following block of code generates 5,000 draws from a bivariate normal distribution with mean (0,0) and covariance matrix Sigma printed in code. The function kde2d(), also from the Mass package generates a two-dimensional kernel density estimation of the distribution’s probability density function.
# SIMULATING MULTIVARIATE DATA # https://stat.ethz.ch/pipermail/r-help/2003-September/038314.html # lets first simulate a bivariate normal sample library(MASS) # Simulate bivariate normal data mu <- c(0,0) # Mean Sigma <- matrix(c(1, .5, .5, 1), 2) # Covariance matrix # > Sigma # [,1] [,2] # [1,] 1.0 0.1 # [2,] 0.1 1.0 # Generate sample from N(mu, Sigma) bivn <- mvrnorm(5000, mu = mu, Sigma = Sigma ) # from Mass package head(bivn) # Calculate kernel density estimate bivn.kde <- kde2d(bivn[,1], bivn[,2], n = 50) # from MASS package
R offers …read more
Source:: http://revolutionanalytics.com