In this post, I'm going to review some basic concepts from this work in the context of the two-variable case. My later posts review issues arising in the multiple variable case, including confounding and mediation. My concluding thoughts for the series are here.
I'm doing this as much for myself as for any particular audience, but maybe my rambling will be useful to others as well. For those who want to know more, go read Pearl's 2009 review paper. Seriously. It's remarkable.
Correlation and causation
Anyone who has taken an introductory statistics class has heard the dictum "correlation is not causation". Although phrased in terms of a correlation, this dictum embraces a much broader idea: one should not draw causal conclusions from an observational study.
Consider the following situation. You are interested in a particular psychological variable -- say, attitudes toward ice cream. You want to know whether the psychological variable causes behavior -- so, whether people who like ice cream a lot tend to buy more of it.
If you had some machine, an Attitude-O-Matic, that could freely set a person's ice cream attitudes to a particular level while leaving all other variables untouched, the task would be straightforward. You could simply find a bunch of people and measure their ice cream purchases under varying settings of the Attitude-O-Matic. Lacking said machine, you take an easier route -- you ask several thousand people about how much they like ice cream and their ice cream purchases over the past month. Graphing the relationship, you find something like the following:
This graph shows a clear positive association between attitudes and purchasing behavior -- once we know that someone has positive attitudes toward ice cream, we can predict with reasonable confidence that they purchase a lot of ice cream. Can we then conclude that attitudes cause ice cream purchases?
As we get drilled into our heads in statistics, the answer is no. The reason for this is that, in the two-variable case, three types of causal relationships could produce an association. Consider the following diagram.
Causation as action
The above discussion suggests that associative and causal concepts are not the same. Intuitively, this difference is about seeing versus doing. After all, although we expect to feel a change in temperature upon seeing a thermometer reading increase from 20 degrees Celsius to 30 degrees Celsius, we do not expect the same temperature increase after setting a thermometer that once read 20 degrees to the value of 30 degrees.
This difference between seeing versus doing underlies Pearl's (1995) definition of causality. Pearl defines a causal action as one that sets a variable to a particular value while not changing anything else. Fundamentally, the process of setting a variable to a value is the process of changing it. The distinction between seeing and doing is also consistent with my description of the Attitude-O-Matic: If a process is available to change a variable, isolating its causal influence is easier than if no such process is available.
The bright line between causal and associative concepts has another implication. No matter the amount of associative information you have, that information should not change your opinions about the causal relationships of the variables involved. In other words, seeing a rising thermometer value co-occur with rising temperatures should not change your belief that the thermometer values cause increasing temperatures, nor should it modify any other belief you have about causation.
Sometimes, however, associative data is all we have. Even in these cases, we are sometimes able to conclude with reasonable confidence that a causal relationship exists.
Consider the proposition that smoking causes lung cancer. We are reasonably confident that this proposition is true. Yet we did not conclude this by assigning people to different cigarette dosages. For one thing, such assignment would be unethical if we suspect that smoking creates harm. For another, if we observe a person develop cancer under a particular dosage of cigarettes, we cannot magic the cancer and smoking history away to see what would happen if history had unfolded differently. How, then, did we reach the conclusion that smoking causes cancer? Is this conclusion justifiable?
The answer is that, for a given study, we can only estimate a causal effect if we are willing to make certain causal assumptions.
Let's go back to the causal diagram that I introduced in the context of ice cream attitudes and purchases.
This diagram encodes three causal relationships:
(1) Attitudes → purchases ($a$) representing the causal effect of attitudes on purchases. This is the relationship that we want to estimate
(2) Purchases → attitudes ($b$) representing the causal effect of purchases on attitudes
(3) Unmeasured causes of attitudes ⟷ unmeasured causes of purchases ($c$), representing either shared unmeasured causes of attitudes and purchases or causal links between separate unmeasured causes
What prevents us from estimating the causal influence of attitudes on ice cream purchases is the confounding influence of relationships $b$ and $c$. If we could eliminate these relationships, identifying relationship $a$ would be quite easy -- it would merely be the very correlation that we calculated at the beginning of the post.
In fact, sometimes we are pretty sure that these other relationships are not present due to external knowledge. For example, if I used my Attitude-O-Matic to assign people to a certain level of ice cream attitude, and I can ensure that my Attitude-O-Matic settings are essentially random with respect to the potential unmeasured causes of ice cream purchases (e.g, by linking them to a dice-roll). This can make me reasonably confident that both the $b$ and $c$ paths are equal to 0.
This is the magic of random assignment -- it gives us a strong reason to believe that paths $b$ and $c$ are not present in our study, and therefore permits the identification of our causal effect of interest. It is important to emphasize, however, that even in the presence of random assignment, it is our assumptions that we make about the causal relationships $b$ and $c$ that allow us to identify path $a$. In the case of random assignment, these assumptions are pretty justifiable, but random assignment can and will occasionally fail.
In general, then, we can delete the $b$ and $c$ paths by making assumptions about their absence. If we can delete both paths through assumption, we are guaranteed that estimates of the $a$ path will be causal.
Unlike associative assumptions, such as the normality, linearity, and constant variance assumptions from the General Linear Model, causal assumptions are strong and untestable. Given a large enough sample, we can always determine whether it is reasonable to assume linear relationships. In contrast, our data cannot tell us whether we have an example of backward causation (path $b$) or unmeasured confounders (path $c$), even in principle. This means that, when we make causal assumptions, we are always in the position of defending them with background knowledge.
What other sorts of knowledge suffice to justify assumptions about paths $b$ and $c$? Whether given assumption is "justified" is a matter of judgment. There are two common classes:
(1) Temporal ordering. If we know that the shift in attitudes occurred before the change in purchases, we might justifiably assume that path $b = 0$, as causes must precede effects. We are generally more justified in applying this criterion if our variables are one-off events -- if they are things like attitudes, spending habits, and personalities, any change that we observe could well have existed prior to our measurements. In addition, temporal ordering does not necessarily justify the elimination of path $c$ -- it is still quite possible that a common cause (say, SES) causes both the change in attitudes and the change in purchasing behavior.
(2) Strong theory. Sometimes we know that a path is highly implausible. For example, if we are examining sex differences in mental rotation ability, we know that sex is fully characterized by one's sex chromosomes and cannot be influenced by a rotating objects in one's head. This allows us to once again eliminate path $b$. However, we cannot necessarily eliminate path $c$ because one's sex chromosomes also cause differences in socialization experiences, which could plausibly affect mental rotation ability.
Depending on the specific content area, other criteria may serve as justifications for causal assumptions (see, for example, the Bradford-Hill criteria for epidemiology). In general, however, the most persuasive arguments for eliminating pathways rest on very strong logical grounds. Eliminating pathways should not be an ad-hoc decision, and these decisions should be explicitly discussed and defended by those who make them so that others have an opportunity to evaluate this reasoning.
What this means is that our causal inferences from associative data are only as defensible as our justifications for our causal assumptions. The case for the smoking → lung cancer relationship was built, step by step, through longitudinal evidence (thus satisfying the temporal ordering criterion) and by developing strong theory that eliminates the confounding pathway.
According to the Structural Causal Model, causal principles are fundamentally different from associative principles. This does not mean, however, that we cannot learn something from associative data -- if we are willing to specify our assumptions and defend them.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–688.
Pearl, J. (2000). Causality: models, reasoning, and inference. Cambridge, U.K. ; New York: Cambridge University Press.
Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.