ML in epidemiology: thoughts on Keil and Edwards 2018

Found a great paper (that I should already have known about, of course): Keil, A.P., Edwards, J.K. You are smarter than you think: (super) machine learning in context. Eur J Epidemiol 33, 437–440 (2018). https://doi.org/10.1007/s10654-018-0405-9

Here are some brief thoughts on this really enjoyable article, which I would recommend to philosophers of science, medicine, and epidemiology looking for interesting leads on the interaction between epidemiology and ML – as well as to the target audience, epidemiologists.

Here are some very brief, unfiltered thoughts.

  1. Keil and Edwards discuss an approach, “super learning”, that assembles the results of a bundle of different methods and returns the best (as defined by a user-specified, but objective, measure). In an example, they show how adding a method to that bundle can result in a worse result. Philosophically, this resonates with familiar facts about non-deductive reasoning, namely that as you add information, you can “break” and inference, whereas adding information to the premise set of a deductive argument does not invalidate the inference provided the additional information is consistent with what’s already there. Not sure what to make of the resonance yet, but it reminds me of counterexamples to deductive-nomological explanation – which is like ML in being formal.
  2. They point out that errors like this are reasonably easy for humans to spot, and conclude: “We should be cautious, however, that the billions of years of evolution and experience leading up to current levels of human intelligence is not ignored in the context of advances to computing in the last 30 years.” I suppose my question would be whether all such errors are easy for humans to spot, or whether only the ones we spot are easy to spot. Here, there is a connection with the general intellectual milieu around Kahneman and Tversky’s work on biases. We are indeed honed by evolution, but this leads us to error outside of our specific domain, and statistical reasoning is one well-documented error zone for intuitive reasoning. I’m definitely not disagreeing with their scepticism about formal approaches, but I’m urging some even-handed scepticism about our intuitions. Where the machine and the human disagree, it seems to me a toss-up who, if either, is right.
  3. The assimilation of causal inference to a prediction problem is very useful and one I’ve also explored. It deserves wider appreciation among just about everyone. What would be nice is to see more discussion about prediction under intervention, which, according to some, are categorically different from other kinds. Will machine learning prove capable of making predictions about what will happen under interventions? If so, will this yield causal knowledge as a matter of definition, or could the resulting predictions be generated in a way that is epistemically opaque? Interventionism in philosophy, causal inference in epidemiology, and the “new science of cause and effect” might just see their ideas put to empirical test, if epidemiology picks up ML approaches in coming years. An intervention-supporting predictive algorithm that does not admit of a ready causal interpretation would force a number of books to be rewritten. Of course, according to those books, it should be impossible; but the potency of a priori reasoning about causation is, to say the least, disputed.

The role of philosophers in the coronavirus pandemic

What is the point of philosophy? That’s a question many philosophers struggle with, not just because it is difficult to answer. That goes for many academic disciplines, including “hard” sciences and applied disciplines like economics. However, unlike physicists and economists, philosophers ought to be able to answer this question, in the perception of many. And many of us can’t, at least to our own satisfaction.

I’ve written some opinion pieces (1,2) and given some interviews during this period, and I know of a handful of other philosophers who have done so (like Benjamin Smart, Arthur Caplan, and Stefano Canali). However, I also know of philosophers who have expressed frustration at the “uselessness” of philosophy in times like these. At the same time, I’ve seen an opinion piece by a computer scientist, whose expert contribution is confined to the nature of exponential growth: something that all of us with a basic mathematical education have studied, and which anyone subject to a compound interest rate, for example through a mortgage, will have directly experienced.

Yet computer science hasn’t covered itself in glory in this epidemic. Machine learning publications claiming to be able to arrive at predictive models in a matter of weeks have been notably lacking in this episode, confirming, for me, the view that machine learning and epidemiology have yet to interact meaningfully. Why do computer scientists (only one, admittedly; most of them are surely more sensible) and philosophers have such different levels of confidence at pronouncing on matters beyond their expertise?

There are no experts on the COVID-19 pandemic

This pandemic is subject to nobody’s expertise. It’s a novel situation, and expertise is remarkably useless when things change, as economists discovered in 2008 and pollsters in 2016.

Of course, parts of the current situation fall within the domains of various experts. Infectious disease epidemiologists can predict its spread. But there is considerably more to this pandemic than predicting its spread. In particular, the prediction of the difference that interventions make requires a grasp of causal inference that is a distinct skill set from that of the prediction of a trend, as proponents of the potential outcomes approach have correctly pointed out. Likewise, the attribution, after the fact, of a certain outcome to an intervention only makes good sense when we know what course of action we are comparing that intervention with; and this may be underspecified, because the “would have died otherwise” trend is so hard to establish.

Non-infectious-disease epidemiologists may understand the conceptual framework, methodology, terminology and pitfalls of the current research on the pandemic, but they do not necessarily have better subject-specific expertise than many in public health, the medical field, or others with a grasp on epidemiological principles. Scientists from other disciplines may be worse than the layperson because, like the computer scientist just mentioned, they wrongly assume that their expertise is relevant, and in doing so either simplify the issue to a childish extent, or make pronouncements that are plain wrong. (Epidemiology is, in my view, widely under-respected by other scientists.)

Turning to economics and politics, economists can predict the outcome of a pandemic or of measures to control it only if they have input from infectious disease epidemiologists on the predictive claims whose impacts they are seeking to assess.

Moreover, the health impact of economic policies are well-studied by epidemiologists, and to some extent by health economists; but these are not typically knowledgeable about the epidemiology of infectious disease outbreaks of this nature.

Jobs for philosophers

In this situation, my opinion is that philosophers can contribute substantially. My own thinking has been around cost-benefit analysis of public health interventions, and especially the neglect of the health impact – especially in very different global locations – of boilerplate measures being recommended to combat the health impact of the virus. This is obviously a lacuna, and especially pressing for me as I sit writing this in my nice study in Johannesburg, where most people do not have a nice study. Africa is always flirting with famine (there are people who will regard this as an insult; it is not). Goldman Sachs is predicting a 24% decline in US GDP next quarter.

If this does not cost lives in Africa, that would be remarkable. It might even cost more lives than the virus would, in a region where only 3% are over 65 (and there’s no evidence that HIV status makes a difference to outcomes of COVID-19). South Africa is weeks into the epidemic and saw its first two deaths just today.

Yet the epidemiological community (at least on my Twitter feed) has entirely ignored either the consequences of interventions on health, merely pointing out that the virus will have its own economic impact even without interventions, which is like justifying the Bay of Pigs by pointing out that Castro would have killed people even without the attempted invasion. And context is nearly totally ignored. The discipline appears mostly to have fallen behind the view that the stronger the measure, the more laudable. Weirdly, those who usually press for more consideration of social angles seem no less in favour, despite the fact that they spend most of the rest of their time arguing that poverty is wrongly neglected as a cause of ill-health.

Do I sound disappointed in the science that I’m usually so enthusiastic about, and that shares with philosophy the critical study of the unknown? Here we have a virus that may well claim a larger death toll in richer countries with older populations, and a set of measures that are designed by and for those countries, and a total lack of consideration of local context. Isn’t this remarkable?

There is more to say, and many objections; I’ll write this up in an academically rigorous way as soon as I can. Meanwhile, I’ll continue to publish opinion pieces, where I think it’s useful. Right now, my point is that there’s a lot for philosophers to dissect here. I don’t mean in this particular problem, but in the pandemic as a whole. And the points don’t have to be rocket science. They can be as simple as recommending that a ban on sale of cigarettes be lifted.

What is required for us to be useful, however, is that we apply our critical thinking skills to the issue at hand. Falling in with common political groupings adds nothing unique and requires the suspension of the same critical faculties that we philosophers pride ourselves on in other contexts. This is a situation where nearly all the information on which decisions are being made is publicly available, where none of it is the exclusive preserve of a single discipline, and where fear clouds rational thought. Expert analyses of specific technical problems are also readily available. These are ideal conditions for someone trained to apply analytic skills in a relatively domain-free manner to contribute usefully.

Off the top of my head, here are a handful topic ideas:

  • How to circumscribe the consequences of COVID-19 that we are interested in when devising our measures of intervention (this is an ethical spin on the issue I’m interested in above)
  • The nature of good prediction (which I’ve worked on in the public health context – but there is so much more to say)
  • The epistemology of testimony, especially concerning expertise, in a context of minimal information (to get us past the “trust the scientists FFS” dogma – that’s an actual quote from Twitter)
  • The weighing of the rights of different groups, given the trade off between young and old deaths (COVID-19 kills almost no children, while they will die in droves in a famine)

One’s own expertise will suggest other topics, provided that the effort is to think critically rather than simply identify people with whom one agrees. I very much hope that we will not see a straightforward application of existing topics: inductive risk and coronavirus; definition of health and coronavirus; rights and coronavirus; etc. To be clear, I’m not saying that no treatment of coronavirus can mention inductive risk, definition of health, or rights; just that the treatment must start with Coronavirus. My motto in working on the philosophy of epidemiology is that my work is philosophical in character but epidemiological in subject: it is philosophical work about epidemiology. Where it suggests modifications to existing debates in philosophy, as does happen, that is great, but it’s not the purpose. The idea is to identify new problems, not to cast old ones in a new light. Perhaps there are no such things as new philosophical problems; but then again, perhaps it’s only by trying to identify new problems that we can cast new light on old ones.

Call to arms

The skill of philosophers, and the value in philosophy, does not lie in our knowledge of debates that we have had with each other. It lies in our ability to think fruitfully about the unfamiliar, the disturbing, the challenging, and even the abhorrent. The coronavirus pandemic is all these things. Let’s get stuck in.

America Tour: Attribution, prediction, and the causal interpretation problem in epidemiology

Next week I’ll be visiting America to talk in Pittsburgh, Richmond, and twice at Tufts. I do not expect audience overlap so I’ll give the same talk in all venues, with adjustments for audience depending on whether it’s primarily philosophers or epidemiologists I’m talking to. The abstract is below. I haven’t got a written version of the paper that I can share yet but would of course welcome comments at this stage.

ABSTRACT

Attribution, prediction, and the causal interpretation problem in epidemiology

In contemporary epidemiology, there is a movement, part theoretical and part pedagogical, attempting to discipline and clarify causal thinking. I refer to this movement as the Potential Outcomes Aproach (POA). It draws inspiration from the work of Donald Ruben and, more recently, Judea Pearl, among others. It is most easily recognized by its use of Directed Acycylic Graphs (DAGs) to describe causal situations, but DAGs are not the conceptual basis of the POA in epidemiology. The conceptual basis (as I have argued elsewhere) is a commitment to the view that the hallmark of a meaningful causal claim is that they can be used to make predictions about hypothetical scenarios. Elsewhere I have argued that this commitment is problematic (notwithstanding the clear connections with counterfactual, contrastive and interventionist views in philosophy). In this paper I take a more constructive approach, seeking to address the problem that troubles advocates of the POA. This is the causal interpretation problem (CIP). We can calculate various quantities that are supposed to be measures of causal strength, but it is not always clear how to interpret these quantities. Measures of attributability are most troublesome here, and these are the measures on which POA advocates focus. What does it mean, they ask, to say that a certain fraction of population risk of mortality is attributable to obesity? The pre-POA textbook answer is that, if obesity were reduced, mortality would be correspondingly lower. But this is not obviously true, because there are methods for reducing obesity (smoking, cholera infection) which will not reduce mortality. In general, say the POA advocates, a measure of attributability tells us next to nothing about the likely effect of any proposed public health intervention, rendering these measures useless, and so, for epidemiological purposes, meaningless. In this paper I ask whether there is a way to address and resolve the causal interpretation problem without resorting to the extreme view that a meaningful causal claim must always support predictions in hypothetical scenarios. I also seek connections with the notorious debates about heritability.

Workshop, Helsinki: What do diseases and financial crises have in common?

AID Forum: “Epidemiology: an approach with multidisciplinary applicability”

(Unfamiliar with AID forum? For the very idea and the programme of Agora for Interdisciplinary Debate, see www.helsinki.fi/tint/aid.htm)

DISCUSSED BY:

Mervi Toivanen (economics, Bank of Finland)

Jaakko Kaprio (genetic epidemiology, U of Helsinki)

Alex Broadbent (philosophy of science, U of Johannesburg)

Moderated by Academy professor Uskali Mäki

Session jointly organised by TINT (www.helsinki.fi/tintand the Finnish Epidemiological Society (www.finepi.org)

TIME AND PLACE:

Monday 9 February, 16:15-18

University Main Building, 3rd Floor, Room 5

http://www.helsinki.fi/teknos/opetustilat/keskusta/f33/ls5.htm

TOPIC: What do diseases and financial crises have in common?

Epidemiology has traditionally been used to model the spreading of diseases in populations at risk. By applying parameters related to agents’ responses to infection and network of contacts it helps to study how diseases occur, why they spread and how one could prevent epidemic outbreaks. For decades, epidemiology has studied also non-communicable diseases, such as cancer, cardiovascular disease, addictions and accidents. Descriptive epidemiology focuses on providing accurate information on the occurrence (incidence, prevalence and survival) of the condition. Etiological epidemiology seeks to identify the determinants be they infectious agents, environmental or social exposures, or genetic variants. A central goal is to identify determinants amenable to intervention, and hence prevention of disease.

There is thus a need to consider both reverse causation and confounding as possible alternative explanations to a causal one. Novel designs are providing new tools to address these issues. But epidemiology also provides an approach that has broad applicability to a number of domains covered by multiple disciplines. For example, it is widely and successfully used to explain the propagation of computer viruses, macroeconomic expectations and rumours in a population over time.

As a consequence, epidemiological concepts such as “super-spreader” have found their way also to economic literature that deals with financial stability issues. There is an obvious analogy between the prevention of diseases and the design of economic policies against the threat of financial crises. The purpose of this session is to discuss the applicability of epidemiology across various domains and the possibilities to mutually benefit from common concepts and methods.

QUESTIONS:

1. Why is epidemiology so broadly applicable?

2. What similarities and differences prevail between these various disciplinary applications?

3. What can they learn from one another, and could the cooperation within disciplines be enhanced?

4. How could the endorsement of concepts and ideas across disciplines be improved?

5. Can epidemiology help to resolve causality?

READINGS:

Alex Broadent, Philosophy of Epidemiology (Palgrave Macmillan 2013)

http://www.palgrave.com/page/detail/?sf1=id_product&st1=535877

Alex Broadbent’s blog on the philosophy of epidemiology:

https://philosepi.wordpress.com/

Rothman KJ, Greenland S, Lash TL. Modern Epidemiology 3rd edition.

Lippincott, Philadelphia 2008

D’Onofrio BM, Lahey BB, Turkheimer E, Lichtenstein P. Critical need for family-based, quasi-experimental designs in integrating genetic and social science research. Am J Public Health. 2013 Oct;103 Suppl 1:S46-55. doi:10.2105/AJPH.2013.301252.

Taylor, AE, Davies, NM, Ware, JJ, Vanderweele, T, Smith, GD & Munafò, MR 2014, ‘Mendelian randomization in health research: Using appropriate genetic variants and avoiding biased estimates’. Economics and Human Biology, vol 13., pp. 99-106

Engholm G, Ferlay J, Christensen N, Kejs AMT, Johannesen TB, Khan S, Milter MC, Ólafsdóttir E, Petersen T, Pukkala E, Stenz F, Storm HH. NORDCAN: Cancer Incidence, Mortality, Prevalence and Survival in the Nordic Countries, Version 7.0 (17.12.2014). Association of the Nordic Cancer Registries. Danish Cancer Society. Available from http://www.ancr.nu.

Andrew G. Haldane, Rethinking of financial networks; Speech by Mr Haldane, Executive Director, Financial Stability, Bank of England, at the Financial Student Association, Amsterdam, 28 April 2009: http://www.bis.org/review/r090505e.pdf

Antonios Garas et al., Worldwide spreading of economic crisis: http://iopscience.iop.org/1367-2630/12/11/113043/pdf/1367-2630_12_11_113043.pdf

Christopher D. Carroll, The epidemiology of macroeconomic expectations: http://www.econ2.jhu.edu/people/ccarroll/epidemiologySFI.pdf

Causation, prediction, epidemiology – talks coming up

Perhaps an odd thing to do, but I’m posting the abstracts of my two next talks, which will also become papers. Any offers to discuss/read welcome!

The talks will be at Rhodes on 1 and 3 October. I’ll probably deliver a descendant of one of them at the Cambridge Philosophy of Science Seminar on 3 December, and may also give a very short version of 1 at the World Health Summit in Berlin on 22 Oct.

1. Causation and Prediction in Epidemiology

There is an ongoing “methodological revolution” in epidemiology, according to some commentators. The revolution is prompted by the development of a conceptual framework for thinking about causation called the “potential outcomes approach”, and the mathematical apparatus of directed acyclic graphs that accompanies it. But once the mathematics are stripped away, a number of striking assumptions about causation become evident: that a cause is something that makes a difference; that a cause is something that humans can intervene on; and that epidemiologists need nothing more from a notion of causation than picking out events satisfying those two criteria. This is especially remarkable in a discipline that has variously identified factors such as race and sex as determinants of health. In this talk I seek to explain the significance of this movement in epidemiology, separate its insights from its errors, and draw a general philosophical lesson about confusing causal knowledge with predictive knowledge.

2. Causal Selection, Prediction, and Natural Kinds

Causal judgements are typically – invariably – selective. We say that striking the match caused it to light, but we do not mention the presence of oxygen, the ancestry of the striker, the chain of events that led to that particular match being in her hand at that time, and so forth. Philosophers have typically but not universally put this down to the pragmatic difficulty of listing the entire history of the universe every time one wants to make a causal judgement. The selective aspect of causal judgements is typically thought of as picking out causes that are salient for explanatory or moral purposes. A minority, including me, think that selection is more integral than that to the notion of causation. The difficulty with this view is that it seems to make causal facts non-objective, since selective judgements clearly vary with our interests. In this paper I seek to make a case for the inherently selective nature of causal judgements by appealing to two contexts where interest-relativity is clearly inadequate to fully account for selection. Those are the use of causal judgements in formulating predictions, and the relation between causation and natural kinds.

Absolute and relative measures – what’s the difference?

I’m re-working a paper on risk relativism in response to some reviewer comments, and also preparing a talk on the topic for Friday’s meeting at KCL, “Prediction in Epidemiology and Healthcare”. The paper originates in Chapter 8 of my book, where I identify some possible explanations for “risk relativism” and settle on the one I think is best. Briefly, I suggest that there isn’t really a principled way of distinguishing “absolute” and “relative” measures, and instead explain the popularity of relative risk by its superficial similarity to a law of physics, and its apparent independence of any given population. These appearances are misleading, I suggest.

In the paper I am trying to develop the suggestion a bit into an argument. Two remarks by reviewers point me in the direction of further work I need to do. One is the question as to what, exactly, the relation between RR and law of nature is supposed to be. Exactly what character am I supposing that laws have, or that epidemiologists think laws have, such that RR is more similar to a law-like statement than, say, risk difference, or population attributable fraction?

The other is a reference to a literature I don’t know but certainly should, concerning statistical modelling in the social sciences. I am referred to a monograph by Achen in 1982, and a paper by Jan Vandebroucke in 1987, both of which suggest – I gather – a deep scepticism about statistical modelling in the social sciences. Particularly thought-provoking is the idea that all such models are “qualitative descriptions of data”. If there is any truth in that, then it is extremely significant, and deserves unearthing in the age of big data, Google Analytics, Nate Silver, and generally the increasing confidence in the possibility of accurately modelling real world situations, and – crucially – generating predictions out of them.

A third question concerns the relation between these two thoughts: (i) the apparent law-likeness of certain measures contrasted with the apparently population-specific, non-general nature of others; and (ii) the limitations claimed for statistical modelling in some quarters contrasted with confidence in others. I wonder whether degree of confidence has anything to do with perceived law-likeness. One’s initial reaction would be to doubt this: when Nate Silver adjusts his odds on a baseball outcome, he surely does not take himself to be basing his prediction on a law-like generalisation. Yet on reflection, he must be basing it on some generalisation, since the move from observed to unobserved is a kind of generalising. What more, then, is there to the notion of a law, than generalisability on the basis of instances? It is surprising how quickly the waters deepen.

Relative Activity in philosepi

Having neglected this blog for several months I find myself suddenly swamped with things to write about. My book has been translated into Korean by Hyundeuk Cheon, Hwang Seung-sik, and Mr Jeon, and judging by their insightful comments and questions they have done a superb and careful job. Next week there is a workshop on Prediction in Epidemiology and Healthcare at KCL, organised by Jonathan Fuller and Luis Jose Flores, which promises to be exciting. Coming up in August is the World Congress of Epidemiology, where I’m giving two talks, hopefully different ones – one on stability for a session on translation and public engagement, and one on the definition of measures of causal strength as part of a session for the next Dictionary of Epidemiology. And I’m working on a paper on risk relativism which has been accepted by Journal of Epidemiology and Community Health subject to revisions in response to the extremely interesting comments of 5 reviewers – I think this is possibly the most rigorous and most useful review process I have encountered. Thus this is a promissory note, by which I hope to commit my conscience to writing here about risk relativism, stability and measures of causal strength in the coming weeks.