ML in epidemiology: thoughts on Keil and Edwards 2018

Found a great paper (that I should already have known about, of course): Keil, A.P., Edwards, J.K. You are smarter than you think: (super) machine learning in context. Eur J Epidemiol 33, 437–440 (2018).

Here are some brief thoughts on this really enjoyable article, which I would recommend to philosophers of science, medicine, and epidemiology looking for interesting leads on the interaction between epidemiology and ML – as well as to the target audience, epidemiologists.

Here are some very brief, unfiltered thoughts.

  1. Keil and Edwards discuss an approach, “super learning”, that assembles the results of a bundle of different methods and returns the best (as defined by a user-specified, but objective, measure). In an example, they show how adding a method to that bundle can result in a worse result. Philosophically, this resonates with familiar facts about non-deductive reasoning, namely that as you add information, you can “break” and inference, whereas adding information to the premise set of a deductive argument does not invalidate the inference provided the additional information is consistent with what’s already there. Not sure what to make of the resonance yet, but it reminds me of counterexamples to deductive-nomological explanation – which is like ML in being formal.
  2. They point out that errors like this are reasonably easy for humans to spot, and conclude: “We should be cautious, however, that the billions of years of evolution and experience leading up to current levels of human intelligence is not ignored in the context of advances to computing in the last 30 years.” I suppose my question would be whether all such errors are easy for humans to spot, or whether only the ones we spot are easy to spot. Here, there is a connection with the general intellectual milieu around Kahneman and Tversky’s work on biases. We are indeed honed by evolution, but this leads us to error outside of our specific domain, and statistical reasoning is one well-documented error zone for intuitive reasoning. I’m definitely not disagreeing with their scepticism about formal approaches, but I’m urging some even-handed scepticism about our intuitions. Where the machine and the human disagree, it seems to me a toss-up who, if either, is right.
  3. The assimilation of causal inference to a prediction problem is very useful and one I’ve also explored. It deserves wider appreciation among just about everyone. What would be nice is to see more discussion about prediction under intervention, which, according to some, are categorically different from other kinds. Will machine learning prove capable of making predictions about what will happen under interventions? If so, will this yield causal knowledge as a matter of definition, or could the resulting predictions be generated in a way that is epistemically opaque? Interventionism in philosophy, causal inference in epidemiology, and the “new science of cause and effect” might just see their ideas put to empirical test, if epidemiology picks up ML approaches in coming years. An intervention-supporting predictive algorithm that does not admit of a ready causal interpretation would force a number of books to be rewritten. Of course, according to those books, it should be impossible; but the potency of a priori reasoning about causation is, to say the least, disputed.

Western Cape COVID-19 levels higher than rest of SA. Is it because they defy lockdown there? Probably not, says phone data Evidence from phone data that W Cape adherence to lockdown has been quite strict thus lack of adherence is less likely to be the cause of the spike there. Thanks to Monomiat Ebrahim for the share.

Wondering if this means it is more likely to be:

1. A demographic feature such as age

2. A latitude feature – around the equator, COVID-19 has generally been less prevalent

3. A climate feature

4. High concentrations of “starters” leading to a critical mass for an epidemic

…add your pet hypothesis here!

From Judea Pearl’s blog: report of a webinar: “Artificial Intelligence and COVID-19: A wake-up call” #epitwitter @TheBJPS

Check the entry on Pearl’s blog which includes a write-up provided by the organisers

Video of the event is available too

This Thursday at 11:30am (via Zoom) the @CHESS_DurhamUni reading group will be discussing our recent report from the IFK, ‘A Framework for Decisions in a Post-COVID World’ by @AlexBroadbent

This Thursday at 11:30am (via Zoom) the @CHESS_DurhamUni reading group will be discussing ‘A Framework for Decisions in a Post-COVID World‘ by @AlexBroadbent . . . please contact for the paper and joining instructions #COVID19 #socialpolicy #policymakers

“…regardless of government interventions [, after] around a two week exponential growth of cases (and, subsequently, deaths) some kind of break kicks in, and growth starts slowing down. The curve quickly becomes “sub-exponential”.

Freddie Sayers of Unherd interviews Michael Levitt (a Nobel-prize-winning non-epidemiologist) on a purely statistical observations of the pattern of the epidemic. Given that the only way we have of measuring effectiveness of government interventions is statistical, that’s interesting. The fun stuff (epidemiological and statistical) comes in deciding whether the correlation is causal. But there’s been no progress with that, in my opinion; in fact for me it is here that the epidemiological profession has disappointed me – it is at if epidemiology has forgotten everything it ever taught itself about causal inference. Against that background, this is ought to give pause for thought.

Wall Street Journal: ‘Do Lockdowns Save Many Lives? In Most Places, the Data Say No’

I can’t vouch for the methodology here; I’m sharing for interest. To be honest I’m sceptical about evidence about effectiveness of lockdown in general – it’s going to be tough to figure out and may require a lengthy progress. Anyway, I do predict we will see more of these kinds of claims, and even if they are flimsy, so, to be frank, are many of the claims made about locking down. Perhaps the most interesting thing going on right now is that there a change in what seems obvious. Things that formerly spoke for themselves no longer do. From the perspective of someone who thinks about science, that is fascinating. It’s part of what Kuhn called paradigm shift.

Lower COVID risk among smokers?… #epitwitter

Some evidence that some smokers may in fact have LESS serious symptoms than non smokers. Interesting! Wonder if this will hold water on further investigation. The researchers are now planning to test nicotine patches.

Predicting Pandemics: Lessons from (and for) COVID-19

This is a live online discussion between Jonathan Fuller and Alex Broadbent, hosted by the Institute for the Future of Knowledge in partnership with the Library of the University of Johannesburg. Comments and discussion are hosted on this page, and you can watch the broadcast here:

We know considerably more about COVID-19 than anyone has previously known about a pandemic of a new disease. Yet we are uncertain about what to do. Even where it appears obvious that strategies have worked or failed, it will take some time to establish that the observed trends are fully or even partly explained by anything we did or didn’t do. And when we take a lesson from one place and try to apply it in another, we have to contend with the huge differences between different places in the world, especially age and wealth. This conversation explores these difficulties, in the hope of improving our response to the uncertainty that always accompanies pandemics, our ability to tell what works, our sensitivity to context, and thus our collective ability to arrive at considered decisions with clearly identified goals and a based on a comprehensive assessment of the relevant costs, benefits, risks, and other factors.

Further reading:

Professor Alex Broadbent (PhD) is Director of the Institute for the Future of Knowledge at the University of Johannesburg and Professor of Philosophy at the University of Johannesburg. He specialises in prediction, causal inference, and explanation, especially in epidemiology and medicine. He publishes in major journals in philosophy, epidemiology, medicine and law, and his books include the path-breaking Philosophy of Epidemiology (Palgrave 2013) and Philosophy of Medicine (Oxford University Press 2019).

Dr Jonathan Fuller (PhD, MD) is a philosopher working in philosophy of science, especially philosophy of medicine. He is an Assistant Professor in the Department of History and Philosophy of Science (HPS) at the University of Pittsburgh, and a Research Associate with the University of Johannesburg. He is also on the International Philosophy of Medicine Roundtable Scientific Committee. He was previously a postdoctoral research fellow in the Institute for the History and Philosophy of Science at the University of Toronto.

Potential Outcomes Approach as “epidemiometrics”

In a review of Jan Tinbergen’s work, Maynard Keynes wrote:

At any rate, Prof. Tinbergen agrees that the main purpose of his method is to discover, in cases where the economist has correctly analysed beforehand the qualitative character of the causal relations, with what strength each of them operates… [1]

Nancy Cartwright cites this passage in the context of describing the business of econometrics, in the introduction to her Hunting Causes and Using Them [2]. Her idea is that econometrics assumes that economics can be an exact science, that economic phenomena are governed by causal laws, and sets out to quantify them, making econometrics a fruitful domain for a study of the connection between laws and causes.

This helped me with an idea that first occurred to me at the 9th Nordic Conference of Epidemiology and Register-Based Health Research, that the potential outcomes approach to causal inference in epidemiology might be understood as the foundational work of a sub-discipline within epidemiology, related to epidemiology as econometrics is to economics. We might call it epidemiometrics.

This suggestion appears to resonate with Tyler Vanderweele’s contention that:

A distinction should be drawn between under what circumstances it is reasonable to refer to something as a cause and under what circumstances it is reasonable to speak of an estimate of a causal effect… The potential outcomes framework provides a way to quantify causal effects… [3]

The distinction between causal identification and estimation of causal effects does not resolve the various debates around the POA in epidemiology, since the charge against the POA is that as an approach (the A part in POA) it is guilty of overreach. For example, the term “causal inference” is used prominently where “quantitative causal estimation” might be more accurate [4]. 

Maybe there is a lesson here from the history of economics. While the discipline of epidemiology does not pretend to uncover causal laws, as does economics, it nevertheless does seek to uncover causal relationships, at least sometime. The Bradford Hill viewpoints are for answering a yes/no question: “is there any other way of explaining the facts before us, is there any other answer equally, or more, likely than cause and effect?” [5]. Econometrics answers a quantitative question: what is the magnitude of the causal effect, assuming that there is one? This question deserves its own disciplines because, like any quantitative question, it admits of many more precise and non-equivalent formulations, and of the development of mathematical tools. Recognising the POA not as an approach to epidemiology research, but as a discipline within epidemiology is deserved.

Many involved in discussions of the POA (including myself and co-authors) have made the point that the POA is part of a larger toolkit and that this is not always recognised [6,7], while others have argued that causal identification is a separate goal of epidemiology from causal estimation and that is at risk of neglect [8]. The italicised components of these contentions do not in fact concern the business of discovering or estimating causality. They are points about the way epidemiology is taught, and how it is understood by those who practice it. They are points, not about causality, but about epidemiology itself.

A disciplinary distinction between epidemiology and a sub-discipline of epidemiometrics might assist in realising this distinction that many are sensitive to, but that does not seem to have poured oil on the water of discussions of causality. By “realising”, I mean enabling institutional recognition at departmental or research unit level, enabling people to list their research interests on CVs and websites, assisting students in understanding the significance of the methods they are learning, and, most important of all, softening the dynamics between those who “advocate” and those who “oppose” the POA. To advocate econometrics over economics, or vice versa, would be nonsensical, like arguing liner algebra is more or less important than mathematics. Likewise, to advocate or oppose epidemiometrics would be recognisably wrong-headed. There would remain questions about emphasis, completeness, relative distribution of time and resources–but not about which is the right way to achieve the larger goals.

Few people admit to “advocating” or “opposing” the methods themselves, because in any detailed discussion it immediately becomes clear that the methods are neither universally, nor never, applicable. A disciplinary distinction–or, more exactly, a distinction of a sub-discipline of study that contributes in a special way to the larger goals of epidemiology–might go a long way to alleviating the tensions that sometimes flare up, occasionally in ways that are unpleasant and to the detriment of the scientific and public health goals of epidemiology as a whole.

[1] J.M. Keynes, ‘Professor Tinbergen’s Method’, Economic Journal, 49 (1939), 558-68 n. 195.

[2] N. Cartwright, Hunting Causes and Using Them (New York: Cambridge University Press, 2007), 15.

[3] T. Vanderweele, ‘On causes, causal inference, and potential outcomes’, International Journal of Epidemiology, 45 (2016), 1809.

[4] M.A. Hernán and J.M. Robins, Causal Inference: What If (Boca Raton: Chapman & Hall/CRC, 2020).

[5] A. Bradford Hill, ‘The Environment and Disease: Association or Causation?’, Proceedings of the Royal Society of Medicine, 58 (1965), 299.

[6] J. Vandenbroucke, A. Broadbent, and N. Pearce, ‘Causality and causal inference in epidemiology: the need for a pluralistic approach’, International Journal of Epidemiology, 45 (2016), 1776-86.

[7] A. Broadbent, J. Vandenbroucke, and N. Pearce, ‘Response: Formalism or pluralism? A reply to commentaries on ‘Causality and causal inference in epidemiology”, International Journal of Epidemiology, 45 (2016), 1841-51.

[8] Schwartz et al., ‘Causal identification: a charge of epidemiology in danger of marginalization’, Annals of Epidemiology, 26 (2016), 669-673.

M, PhD and PostDoc opportunities at UJ

The University of Johannesburg has released a special call offering masters, doctoral and postdoctoral fellowships, for start asap, deadline 8th Feb 2020.

These are in any area, but I would like to specifically invite anyone wishing to work with myself (or colleagues at UJ) on any of the areas listed below. From May 2020, I will be Director of the Institute for the Future of Knowledge at UJ (a new institute – no website yet – but watch this space!), and being part of this enterprise will, I think, be very exciting for potential students/post-docs. I would be delighted to receive inquiries in any of the following areas:

  • Philosophy of medicine
  • Philosophy of epidemiology
  • Causation
  • Counterfactuals
  • Causal inference
  • Prediction
  • Explanation (not just causal)
  • Machine learning (in relation to any of the above)
  • Cognitive science
  • Other things potentially relevant to the Institute, my interests, your interests… please suggest!

If you’re interested please get in touch:

The call is here, along with instructions for applicants:

2020 Call for URC Scholarships for Master’s_Doctoral_Postdoctoral Fellowships_Senior Postdoctoral fellowships