Paper: Causality and Causal Inference in Epidemiology: the Need for a Pluralistic Approach

Delighted to announce the online publication of this paper in International Journal of Epidemiology, with Jan Vandenbroucke and Neil Pearce: ‘Causality and Causal Inference in Epidemiology: the Need for a Pluralistic Approach

This paper has already generated some controversy and I’m really looking forward to talking about it with my co-authors at the London School of Hygiene and Tropical Medicine on 7 March. (I’ll also be giving some solo talks while in the UK, at Cambridge, UCL, and Oxford, as well as one in Bergen, Norway.)

The paper is on the same topic as a single-authored paper of mine published late 2015, ‘Causation and Prediction in Epidemiology: a Guide to the Methodological Revolution.‘ But it is much shorter, and nonetheless manages to add a lot that was not present in my sole-authored paper – notably a methodological dimension that, as a philosopher by training, I was ignorant. The co-authoring process was thus really rich and interesting for me.

It also makes me think that philosophy papers should be shorter… Do we really need the first 2500 words summarising the current debate etc? I wonder if a more compressed style might actually stimulate more thinking, even if the resulting papers are less argumentatively airtight. One might wonder how often the airtight ideal is achieved even with traditional length paper… Who was it who said that in philosophy, it’s all over by the end of the first page?

America Tour: Attribution, prediction, and the causal interpretation problem in epidemiology

Next week I’ll be visiting America to talk in Pittsburgh, Richmond, and twice at Tufts. I do not expect audience overlap so I’ll give the same talk in all venues, with adjustments for audience depending on whether it’s primarily philosophers or epidemiologists I’m talking to. The abstract is below. I haven’t got a written version of the paper that I can share yet but would of course welcome comments at this stage.

ABSTRACT

Attribution, prediction, and the causal interpretation problem in epidemiology

In contemporary epidemiology, there is a movement, part theoretical and part pedagogical, attempting to discipline and clarify causal thinking. I refer to this movement as the Potential Outcomes Aproach (POA). It draws inspiration from the work of Donald Ruben and, more recently, Judea Pearl, among others. It is most easily recognized by its use of Directed Acycylic Graphs (DAGs) to describe causal situations, but DAGs are not the conceptual basis of the POA in epidemiology. The conceptual basis (as I have argued elsewhere) is a commitment to the view that the hallmark of a meaningful causal claim is that they can be used to make predictions about hypothetical scenarios. Elsewhere I have argued that this commitment is problematic (notwithstanding the clear connections with counterfactual, contrastive and interventionist views in philosophy). In this paper I take a more constructive approach, seeking to address the problem that troubles advocates of the POA. This is the causal interpretation problem (CIP). We can calculate various quantities that are supposed to be measures of causal strength, but it is not always clear how to interpret these quantities. Measures of attributability are most troublesome here, and these are the measures on which POA advocates focus. What does it mean, they ask, to say that a certain fraction of population risk of mortality is attributable to obesity? The pre-POA textbook answer is that, if obesity were reduced, mortality would be correspondingly lower. But this is not obviously true, because there are methods for reducing obesity (smoking, cholera infection) which will not reduce mortality. In general, say the POA advocates, a measure of attributability tells us next to nothing about the likely effect of any proposed public health intervention, rendering these measures useless, and so, for epidemiological purposes, meaningless. In this paper I ask whether there is a way to address and resolve the causal interpretation problem without resorting to the extreme view that a meaningful causal claim must always support predictions in hypothetical scenarios. I also seek connections with the notorious debates about heritability.

Is consistency trivial in randomized controlled trials?

Here are some more thoughts on Hernan and Taubman’s famous 2008 paper, from a chapter I am finalising for the epidemiology entry in a collection on the philosophy of medicine. I realise I have made a similar point in an earlier post on this blog, but I think I am getting closer to a crisp expression. The point concerns the claimed advantage of RCTs for ensuring consistency. Thoughts welcome!

Hernan and Taubman are surely right to warn against too-easy claims about “the effect of obesity on mortality”, when there are multiple ways to reduce obesity, each with different effects on mortality, and perhaps no ethically acceptable way to bring about a sudden change in body mass index from say 30 to 22 (Hernán and Taubman 2008, 22). To this extent, their insistence on assessing causal claims as contrasts to well-defined interventions is useful.

On the other hand, they imply some conclusions that are harder to accept. They suggest, for example, that observational studies are inherently more likely to suffer from this sort of difficulty, and that experimental studies (randomized controlled trials) will ensure that interventions are well-specified. They express their point using the technical term “consistency”:

consistency… can be thought of as the condition that the causal contrast involves two or more well-defined interventions. (Hernán and Taubman 2008, S10)

They go on:

…consistency is a trivial condition in randomized experiments. For example, consider a subject who was assigned to the intervention group … in your randomized trial. By definition, it is true that, had he been assigned to the intervention, his counterfactual out- come would have been equal to his observed outcome. But the condition is not so obvious in observational studies. (Hernán and Taubman 2008, s11)

This is a non-sequitur, however, unless we appeal to a background assumption that an intervention—something that an actual human investigator actually does—is necessarily well-defined. Without this assumption, there is nothing to underwrite the claim that “by definition”, if a subject actually assigned to the intervention had been assigned to the intervention, he would have had the outcome that he actually did have.

Consider the intervention in their paper, one hour of strenuous exercise per day. “Strenuous exercise” is not a well-defined intervention. Weightlifting? Karate? Swimming? The assumption behind their paper seems to be that if an investigator “does” an intervention, it is necessarily well-defined; but on reflection this is obviously not true. An investigator needs to have some knowledge of which features of the intervention might affect the outcome (such as what kind of exercise one performs), and thus need to be controlled, and which don’t (such as how far west of Beijing one lives). Even randomization will not protect against confounding arising from preference for a certain type of exercise (perhaps because people with healthy hearts are predisposed both to choose running and to live longer, for example), unless one knows to randomize the assignment of exercise-types and not to leave it to the subjects’ choice.

This is exactly the same kind of difficulty that Hernan and Taubman press against observational studies. So the contrast they wish to draw, between “trivial” consistency in randomized trials and a much more problematic situation in observational studies, is a mirage. Both can suffer from failure to define interventions.

A Tale of Two Papers

I’m on my way back from the World Epi Congress in Anchorage, where causation and causal inference have been central topics of discussion. I wrote previously about a paper (Hernan and Taubman 2008) suggesting that obesity is not a cause of mortality. There is another, more recent paper published in July of this year, suggesting, more or less, that race is not a cause of health outcomes – or at least that it’s not a cause that can feature in causal models (Vanderweele and Robinson 2014). I can’t do justice to the paper here, of course, but I think this is a fair, if crude, summary of the strategy.

This paper is an interesting comparator for the 2008 obesity paper (Hernan and Taubman 2008). It shares the idea that there is a close link between (a) what can be humanly intervened on, (b) what counterfactuals we can entertain, and (c) what causes we can meaningfully talk about. This is a radical view about causation, much stronger than any position held by any contemporary philosopher of whom I’m aware. Philosophers who do think that agency or intervention are central to the concept of causation treat the interventions as in-principle ones, not things humans could actually do.

Yet feasibility of manipulating a variable really does seem to be a driver in this literature. In the paper on race, the authors consider what variables form the subject of humanly possible interventions, and suggest that rather than ask about the effect of race, we should ask what effect is left over after these factors are modelled and controlled for, under the umbrella of socioeconomic status. That sounds to me a bit like saying that we should identify the effects of being female on job candidates’ success by seeing what’s left after controlling for skirt wearing, longer average hair length, shorter stature, higher pitched voice, female names, etc. In other words, it’s very strange indeed. Perhaps it could be useful in some circumstances, but it doesn’t really get us any further with the question of interest – how to quantify the health effects of race, sex, and so forth.

Clearly, there are many conceptual difficulties with this line of reasoning. A good commentary was published with the paper (Glymour and Glymour 2014) which really dismantles the logic of the paper. But I think there are a number of deeper and more pervasive misunderstandings to be cleared up, misunderstandings which help explain why papers like this are being written at all. One is confusion between causation and causal inference; another is confusion between causal inference and particular methods of causal inference; and a third is a mix-up between fitting your methodological tool to your problem, and your problem to your tool.

The last point is particularly striking. What’s so interesting about these two papers (2008 & 2014) is that they seem to be trying to fit research problems to methods, not trying to develop methods to solve problems – even though this is ostensibly what they (at least VW&R 20114) are trying to do. To me, this is strongly reminiscent of Thomas Kuhn’s picture of science, according to which an “exemplary” bit of science occurs, and initiates a “paradigm”, which is a shared set of tools for solving “puzzles”. Kuhn was primarily influenced by physics, but this way of seeing things seems quite apt to explain what is otherwise, from the outside, really quite a remarkable, even bizarre about-turn. Age, sex, race – these are staple objects of epidemiological study as determinants of health; and they don’t fit easily into the potential outcomes paradigm. It’s fascinating to watch the subsequent negotiation. But I’m quite glad that it doesn’t look like epidemiologists are going to stop talking about these things any time soon.

References

Glymour C and Glymour MR. 2014. ‘Race and Sex Are Causes.’ Epidemiology 25 (4): 488-490.

Hernan M and Taubman S. 2008. ‘Does obesity shorten life? The importance of well-defined interventions to answer causal questions.’ International Journal of Obesity 32: S8–S14.

VanderWeele TJ and Robinson WR. 2014. ‘On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables.’ Epidemiology 25(4): 473-484.

Potential Outcomes: Separating Insight from Ideology

I’m in Anchorage, preparing for the World Congress of Epidemiology. One of the sessions I’m speaking at is a consultation for the next edition of the Dictionary of Epidemiology. It’s a strange and delightful document, this Dictionary: since it sets out to define not only individual words but also the discipline of epidemiology as a whole. Thus it contains both mundane and metaphysics entries, from “death certificate” to “causality”. I’m billed to talk about “Defining Measures of Causal Strength”. There’s a lot to say: the current entries under causal-related terms could use some disciplining. But I’m particularly interested in orienting myself with regards to the “potential outcomes” view of causation, which seems to be the current big thing among epidemiologists.

The potential outcomes view is associated in particular with Miguel Hernan, a very smart epidemiologist at Harvard, and he has a number of nice papers on it. (I hope I don’t need to say that what follows is not a personal attack: I have great respect for Hernan, and am stimulated by his work. I’m just taking his view as exemplary of the potential-outcomes approach, in the way that philosophers typically do.)

In particular I’ve been engaged in a close reading of a paper on obesity by Hernan and Taubman (2008). Their view, as expressed in that paper, is an interesting mix of pragmatism and idealism. On the one (pragmatic) hand, they argue that causal questions are often ill-formed, and thus unanswerable. There is no answer to the question “What is the effect of body-mass index (BMI) on all-cause mortality?” because the different ways to intervene on BMI may result in different effects on mortality. Diet, exercise, a combination of diet and exercise, smoking, chopping off a limb – these are all ways to reduce BMI. Until we have specified which intervention we have in mind, we cannot meaningfully quantify the contribution of BMI to mortality.

This much is highly reminiscent of contrastivist theories of causation in philosophy. Contrastivist theories take causation to consist in counterfactual dependence, but differ from counterfactual theories in taking the form of causal statements to be implicitly contrastive: not “c causes e” but “c rather than C* causes e rather than E*”, where C* and E* are classes of events that could occur in the absence of c and e respectively. Against this background, Hernan and Taubman’s point is simply that, for an epidemiological investigator, it matters what contrast class we have in mind when we seek to estimate the size of an effect. This is a good point, especially in a context where one hopes to act on a causal finding. One had better be sure that one knows, not only that there is a causal connection between a given exposure and outcome, but also what will happen if a given intervention replaces the factor under investigation. I have called the failure to appreciate this point The Causal Fallacy and linked it to easy errors in prediction (see this previous post and Broadbent 2013, 82).

But there is another more troubling side to the view as it is expressed in this paper: that randomized controlled trials offer a protection against this error, and somehow force us to specify our interventions precisely. The argument for this claim is striking, but on reflection I fear it is specious.

Hernan and Taubman make a striking point: they say that an observational study might appear to be able to answer the question “What is the effect of BMI on all-cause mortality?” via a statistical analysis of data on BMI and mortality, while randomized controlled trials would not be able to answer this question directly: they would only be able to answer questions like: “What is the effect of reducing BMI via dietary interventions? / via exercise? / via both?” This apparent shortcoming of RCTs is, of course, a strength in disguise: the observational study is in fact not so informative, since it does not distinguish the effects of different ways of reducing BMI; while the RCTs do give us this information.

This argument is fallacious, however, for the following reasons.

  1. An observational study that includes the same information as the RCTs on the methods of reducing BMI would also be able to distinguish between the effects of these interventions.
  2. It is true that one could conduct an observational study which ignored the possibility that different methods of reducing BMI might themselves have affect mortality. But that would be a bad study, since it would ignore the effects of known confounders. A good study would take these things into account.
  3. Conversely, it is a mistake to suppose that RCTs offer protection against this sort of error. The BMI case is a special one, precisely because there are so many ways to intervene to reduce BMI and we know that these could affect mortality. In truth, there are many ways to make any intervention. One may take a pill or a capsule or a suppository, on the equator or in the tropics, before or after a meal, and so on. Even in an RCT, the intervention is not fully specified. Rather, we simply assume that the differences don’t matter, or that if they do, they are “cancelled out” by the randomisation process.
  4. Randomized controlled trials are not controlled in the manner of true controlled experiments; rather, randomization is a surrogate for controlling. We hope that all the many differences between the circumstances of each intervention in the treatment group will either have no effect or, if they do, will have effects that are randomly distributed so as not to obscure the effect of the treatment. But in principle, it is still possible that this hope is not fulfilled. At a p-value of 0.05 this will happen in one RCT in 20; and perhaps more often in published RCTs, given publication bias (i.e. the fact that null results are harder to publish).

These are familiar points in the philosophical literature on randomised controlled trials (see esp. Worrall 2002). The point I wish to pull out is this. On the one hand, Hernan’s emphasis on getting a well-defined contrastive question is insightful and important. But on the other hand, it is wrong to think that RCTs solve the problem. True, in an RCT you must make an intervention. But it does not follow that one’s intervention is well-specified. There might be all sorts of features of the particular way that you intervene that could skew the results. And conversely, plug the corresponding “how it happened” info into a cohort study, and you will be able to obtain the same sorts of discrimination between these methods.

On top of all this, the focus on the methods of individual studies obscures the most important point of all: that convincing evidence comes from a multitude of studies. Just as an RCT allows us to assume that differences between individuals are evenly distributed and thus ignorable, so a multitude of methodologically inferior studies can provide very strong evidence if their methodological shortcomings are different. This is the kind of situation Hill responded to with his guidelines (NOT criteria!) for inferring causality (Hill 1965). Similarly, ad hoc arguments against each possible alternative explanation can add up to a compelling case, as in the classic paper by Cornfield and colleagues on smoking and lung cancer (Cornfield et al 1959). The recent insights of the potential outcomes approach are valuable and important, but they augment rather than replace these familiar, older insights.

References

Broadbent, A. 2013. Philosophy of Epidemiology. Basingstoke and New York: Palgrave Macmillan.

Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB and Wynder EL. 1959. Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22: 173-203.

Hernan, MA and Taubman, SL. 2008. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. International Journal of Obesity 32: S8-S14.

Hill, Austin Bradford. 1965. The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 58: 259-300.

Worrall, J. 2002. What Evidence in Evidence-Based Medicine? The British Journal of the Philosophy of Science 58: 451-488.