Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Fun with Simpson’s Paradox: Simulating Confounders

$
0
0

By Joseph Rickert

Wikipedia describes Simpson’s paradox as “a trend that appears in different groups of data but disappears or reverses when these groups are combined.” Here is the figure from the top of that article (you can click on the image in Wikipedia then follow the “more details” link to find the R code used to generate it. There is a lot of R in Wikipedia).

I rearranged it a bit to put the values in a dataframe, to make it a bit easier to think of the “color” column as a confounding variable:

1 6 1
2 7 1
3 8 1
4 9 1
8 1 2
9 2 2
10 3 2
11 4 2

If we do not consider this confounder, we find that the coefficient of x is negative (the dashed line in the figure above):

coefficients(lm(y ~ x, data=simpson_data))
## (Intercept)           x 
##   8.3333333  -0.5555556

If we do take the confouder into account, we see the coefficient of x is positive:

coefficients(lm(y ~ x + color, data=simpson_data))
## (Intercept)           x       color 
##          17           1         -12

In his book Causality, Judea Pearl makes a more sweeping statement regarding Simpson’s paradox: “Any statistical relationship between two variables may be reversed by including additional factors in the analysis.” [Pearl2009]

That sounds fun; let’s try it.

First we’ll make variables x and y with a simple linear relationship. I’ll use the same slopes and intercepts as in the Wikipedia figure, both to show the parallel and to demonstrate the incredible cosmic power I have …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles