Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Generating and Visualizing Multivariate Data with R

$
0
0

By Joseph Rickert

By Joseph Rickert

The ability to generate synthetic data with a specified correlation structure is essential to modeling work. As you might expect, R’s toolbox of packages and functions for generating and visualizing data from multivariate distributions is impressive. The basic function for generating multivariate normal data is mvrnorm() from the MASS package included in base R, although the mvtnorm package also provides functions for simulating both multivariate normal and t distributions. (For tutorial on how to use R to simulate from multivariate normal distributions from first principles using some linear algebra and the Cholesky decomposition see the astrostatistics tutorial on Multivariate Computations.)

The following block of code generates 5,000 draws from a bivariate normal distribution with mean (0,0) and covariance matrix Sigma printed in code. The function kde2d(), also from the Mass package generates a two-dimensional kernel density estimation of the distribution’s probability density function.

# SIMULATING MULTIVARIATE DATA
# https://stat.ethz.ch/pipermail/r-help/2003-September/038314.html
# lets first simulate a bivariate normal sample
library(MASS)
# Simulate bivariate normal data
mu <- c(0,0)                         # Mean
Sigma <- matrix(c(1, .5, .5, 1), 2)  # Covariance matrix
# > Sigma
# [,1] [,2]
# [1,]  1.0  0.1
# [2,]  0.1  1.0
 
# Generate sample from N(mu, Sigma)
bivn <- mvrnorm(5000, mu = mu, Sigma = Sigma )  # from Mass package
head(bivn)                                      
# Calculate kernel density estimate
bivn.kde <- kde2d(bivn[,1], bivn[,2], n = 50)   # from MASS package

R offers …read more

Source:: http://revolutionanalytics.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles