Fun with Simpson’s Paradox: Simulating Confounders
By Joseph Rickert Wikipedia describes Simpson’s paradox as “a trend that appears in different groups of data but disappears or reverses when these groups are combined.” Here is the figure from the top...
View ArticleRetrieving Data from Google Books with `ngramr`
By Daniel (This article was originally published at Daniel, and syndicated at StatsBlogs.) Karl Marx is the most famous founding fathers of modern sociology with a popularity peak in 1975-6, but...
View ArticleTwitter at conferences
By Egon Willighagen I have been happily tweeting the BioMedBridges meeting in Hinxton last week using the #lifesciencedata hashtag, along with more than 100 others, though a small subset was really...
View ArticleRRegrs: exploring the space of possible regression models
By Egon Willighagen Machine learning is a field of science that focusses on mathematically describing patterns in data. Chemometrics does this for chemical data. Examples are (nano)QSAR where...
View ArticleThe relation between p-values and the probability H0 is true is not weak...
By Daniel Lakens The journal of Basic and Applied Social Pychology banned the p-value in 2015, after Trafimow (2014) had explained in an editorial a year earlier that inferential statistics were no...
View ArticleSunday morning puzzle
By xi’an A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious: If a sample taken from an arbitrary distribution on {0,1}⁶ is censored...
View Articlestatistically significant trends with multiple years of complex survey data
By Anthony Damico guest post by my friend thomas yokota, an oahu-based epidemiologist. palmero professor vito muggeo wrote the joinpoint analysis section of the code below to demonstrate that the...
View ArticlePowerBI adds support for R
By David Smith In the latest update released on November 20, PowerBI has added support for R. The desktop edition of Microsoft’s data visualization and reporting tool now...
View ArticlePowerBI adds support for R
By David Smith In the latest update released on November 20, PowerBI has added support for R. The desktop edition of Microsoft’s data visualization and reporting tool now allows you to run an R script...
View ArticleHow to extract FASTQ from the new MinION FAST5 format using poRe
By biomickwatson A little while ago I demonstrated how to extract FASTQ from MinION FAST5 files using Rscript and the Linux command line. In that article, I described how to extract the different FASTQ...
View ArticleVisualizing MLS Player Salaries with ggplot2
By Teja Kodali Recently, I came across this great visualization of MLS Player salaries. I tried to do something similar with ggplot2, and while I was unable to replicate the interactivity or the...
View ArticleSetting up an AWS instance for R, RStudio, OpenCPU, or Shiny Server
By gluc While most web-developers have worked with Amazon AWS, Microsoft Azure, or similar platforms before, this is still not the case for many R number crunchers. Especially researchers at academic...
View ArticleUsing Apache SparkR to Power Shiny Applications: Part I
By emaasit Introduction The objective of this blog post is demonstrate how to use Apache SparkR to power Shiny applications. I have been curious about what the use cases for a “Shiny-SparkR”...
View ArticleScaling data.table using index
By Jan Górecki – R R can handle fairly big data working on a single machine, 2B (2E9) rows and couple of columns require about 100 GB of memory.This is already well enough to care about...
View ArticleDataOps at SQL in the City
By Angela Roberts By Steph Locke Back in October, I had the pleasure of going along to the annual Redgate conference SQL in the City. This was a great day full of informative talks on how people are...
View ArticleR Workshop at SFS Meeting
By fishR Blog I just noticed that there will be an Introduction to R workshop at the Society for Freshwater Science Annual Meeting in Sacramento on 20-May-20-16. Here is a link to the announcement....
View Articlea programming bug with weird consequences
By xi’an (This article was originally published at Xi’an’s Og » R, and syndicated at StatsBlogs.) One student of mine coded by mistake an independent Metropolis-Hastings algorithm with too small a...
View ArticleR online classes with leading experts at statistics.com (33% discount)
By Tal Galili Statistics.com is an online learning website with 100+ courses in statistics, analytics, data mining, text mining, forecasting, social network analysis, spatial analysis, etc. They have...
View ArticleStatistical Models That Support Design Thinking: Driver Analysis vs. Partial...
By Joel Cadwell We have been talking about design thinking in marketing since Tim Brown’s Harvard Business Review article in 2008. It might be easy for the data scientist to dismiss the approach as...
View ArticleMaking an R based ML model accessible through a simple API
By FishyOperations Building an accurate machine learning (ML) model is a feat on its own. But once you’re there, you still need to find a way to make the model accessible to users. If you want to...
View Article