ML in epidemiology: thoughts on Keil and Edwards 2018

Found a great paper (that I should already have known about, of course): Keil, A.P., Edwards, J.K. You are smarter than you think: (super) machine learning in context. Eur J Epidemiol 33, 437–440 (2018). https://doi.org/10.1007/s10654-018-0405-9

Here are some brief thoughts on this really enjoyable article, which I would recommend to philosophers of science, medicine, and epidemiology looking for interesting leads on the interaction between epidemiology and ML – as well as to the target audience, epidemiologists.

Here are some very brief, unfiltered thoughts.

  1. Keil and Edwards discuss an approach, “super learning”, that assembles the results of a bundle of different methods and returns the best (as defined by a user-specified, but objective, measure). In an example, they show how adding a method to that bundle can result in a worse result. Philosophically, this resonates with familiar facts about non-deductive reasoning, namely that as you add information, you can “break” and inference, whereas adding information to the premise set of a deductive argument does not invalidate the inference provided the additional information is consistent with what’s already there. Not sure what to make of the resonance yet, but it reminds me of counterexamples to deductive-nomological explanation – which is like ML in being formal.
  2. They point out that errors like this are reasonably easy for humans to spot, and conclude: “We should be cautious, however, that the billions of years of evolution and experience leading up to current levels of human intelligence is not ignored in the context of advances to computing in the last 30 years.” I suppose my question would be whether all such errors are easy for humans to spot, or whether only the ones we spot are easy to spot. Here, there is a connection with the general intellectual milieu around Kahneman and Tversky’s work on biases. We are indeed honed by evolution, but this leads us to error outside of our specific domain, and statistical reasoning is one well-documented error zone for intuitive reasoning. I’m definitely not disagreeing with their scepticism about formal approaches, but I’m urging some even-handed scepticism about our intuitions. Where the machine and the human disagree, it seems to me a toss-up who, if either, is right.
  3. The assimilation of causal inference to a prediction problem is very useful and one I’ve also explored. It deserves wider appreciation among just about everyone. What would be nice is to see more discussion about prediction under intervention, which, according to some, are categorically different from other kinds. Will machine learning prove capable of making predictions about what will happen under interventions? If so, will this yield causal knowledge as a matter of definition, or could the resulting predictions be generated in a way that is epistemically opaque? Interventionism in philosophy, causal inference in epidemiology, and the “new science of cause and effect” might just see their ideas put to empirical test, if epidemiology picks up ML approaches in coming years. An intervention-supporting predictive algorithm that does not admit of a ready causal interpretation would force a number of books to be rewritten. Of course, according to those books, it should be impossible; but the potency of a priori reasoning about causation is, to say the least, disputed.

From Judea Pearl’s blog: report of a webinar: “Artificial Intelligence and COVID-19: A wake-up call” #epitwitter @TheBJPS

Check the entry on Pearl’s blog which includes a write-up provided by the organisers

Video of the event is available too

IFK Panel 27 May: Data and Delusion after Covid 19 – Shakir Mohammed (Google Deepmind), Chris Harley (UJ Engineering), Olaf Dammann (Tufts Public Health and Community Medicine) https://universityofjohannesburg.us/4ir/covid-19-webinar-3/ #epitwitter @mediauj

Please join us for a panel discussion on Data and delusion after Covid 19, Wednesday 27 May @ 1pm South Africa, W Europe |  12 noon UK | 7am US East Coast | 7pm Beijing China. Please “arrive” (log in) 15 minutes beforehand to ensure time for you to be admitted prior to the event as we admit participants individually for security reasons. We start sharp on the hour. To join you first need to register.

Panelists:

  • Dr. Shakir Mohammed is a Senior Researcher at Google DeepMind in London, United Kingdom (UK).
  • Professor Charis Harley is an academic based in the Faculty of Engineering and the Built Environment at the University of Johannesburg (UJ), South Africa.
  • Professor Olaf Dammann is Vice-Chair of Public Health at Tufts University in Boston, United States (US), Professor of Perinatal Neuroepidemiology at Hannover Medical School, Germany, and Adjunct Professor in the Department of Neuromedicine and Movement Science at the University of Science and Technology in Trondheim, Norway.

Facilitated by Professor Alex Broadbent, Director of the Institute for the Future of Knowledge at the University of Johannesburg

Please register if you wish to watch this live. A recording will also be posted afterwards.

This is the third in a series of webinars on Reimagining the World After COVID-19, organised by the Institute for the Future of Knowledge in collaboration with the UJ Library and Information Centre on the initiative of the Vice Chancellor’s Office at the University of Johannesburg.

Data and delusion after COVID-19

An epidemic has a single centre from which disease spreads: an epicenter. A pandemic is what happens when the disease no longer spreads from a single centre but circulates and spreads throughout the population. The COVID-19 pandemic has been accompanied by a pandemic of data. Data is offered, analysed, re-packaged and criticized by mighty international organisations and by tiny local outfits. Even private individuals with no prior expertise or interest in data, disease, or statistics spend hours poring over graphs and critiquing case fatality estimates.

Yet this proliferation of data and analysis has not yielded effective predictions. Instead, it has demonstrated how ill-equipped we are to deal with this new, non-hierarchical, distributed information context. Leading scientists have proved dramatically wrong. Or perhaps not – it depends who you ask. The unfolding pattern of spread still surprises us at every turn – except those who predicted it all along. Nothing is more common than the common cold, and coronavirus variants are one of its causes: yet we seem unable make reliable predictions about COVID-19.

This webinar explores a range of issues relating to data and trust in science in the aftermath of COVID-19. What went wrong with the modelling approach to prediction – if, indeed, anything did go wrong? How should policy and scientific research interact, and how should policy makers make use of data? Can people without domain-specific knowledge use data to predict better than the experts in that domain? If not, then can data analysts themselves make predictions merely by studying patterns in data? Turning to the generation of data, how does the individual interest in privacy weight against the public interest in private information, notably location, which can be very useful in the context of a pandemic?

Our improved data processing abilities did not help us as much as we might have imagined in this situation. Machine learning, in particular, thrives on spotting complex patterns in noisy datasets, and doing it fast; yet is has been conspicuously absent from the efforts to predict the course of this pandemic.

Register here