How to create confounders with regression: a lesson from causal inference

By Ben Ogorek

Regression is a tool that can be used to address causal questions in an observational study, though no one said it would be easy. While this article won’t close the vexing gap between correlation and causation, it will offer specific advice when you’re after a causal truth – keep an eye out for variables called “colliders,” and keep them out of your regression!

By the end of this article, we will have explored a situation where adding a variable to a regression will simultaneously

improve the predictive power,
ruin the coefficient estimates.

Thus, the mistake is a tempting one to make. In the sections below, we’ll first review how adding additional variables to a regression can defeat confounding and lead us closer to a causal truth. Then we’ll see that truth evaporate when a variable we thought was a confounder was actually something called a “collider.” As is typical with Anything but R-bitrary articles, there will be lightweight simulations in R to drive the point home.

Sentences that begin with “Controlling for [factors X, Y, and Z], …” are reassuring amidst controversial subject matter. But less reassuring is the implicit assumption that X, Y, and Z are indeed the things we call “confounders.” We review the definition of a confounder via the following causal graph:

In the above diagram, w is a confounder and will distort the percieved causal relationship between x and y if unaccounted for. An example from Counterfactuals and Causal Inference: Methods and Principles for Social Research is the effect of educational attainment (x) on earnings (y) where mental ability (w) is a confounder. The authors remark that the amount of “ability bias” in estimates of educational impact “has remained …read more

Source:: r-bloggers.com