ML in epidemiology: thoughts on Keil and Edwards 2018

Found a great paper (that I should already have known about, of course): Keil, A.P., Edwards, J.K. You are smarter than you think: (super) machine learning in context. Eur J Epidemiol 33, 437–440 (2018).

Here are some brief thoughts on this really enjoyable article, which I would recommend to philosophers of science, medicine, and epidemiology looking for interesting leads on the interaction between epidemiology and ML – as well as to the target audience, epidemiologists.

Here are some very brief, unfiltered thoughts.

  1. Keil and Edwards discuss an approach, “super learning”, that assembles the results of a bundle of different methods and returns the best (as defined by a user-specified, but objective, measure). In an example, they show how adding a method to that bundle can result in a worse result. Philosophically, this resonates with familiar facts about non-deductive reasoning, namely that as you add information, you can “break” and inference, whereas adding information to the premise set of a deductive argument does not invalidate the inference provided the additional information is consistent with what’s already there. Not sure what to make of the resonance yet, but it reminds me of counterexamples to deductive-nomological explanation – which is like ML in being formal.
  2. They point out that errors like this are reasonably easy for humans to spot, and conclude: “We should be cautious, however, that the billions of years of evolution and experience leading up to current levels of human intelligence is not ignored in the context of advances to computing in the last 30 years.” I suppose my question would be whether all such errors are easy for humans to spot, or whether only the ones we spot are easy to spot. Here, there is a connection with the general intellectual milieu around Kahneman and Tversky’s work on biases. We are indeed honed by evolution, but this leads us to error outside of our specific domain, and statistical reasoning is one well-documented error zone for intuitive reasoning. I’m definitely not disagreeing with their scepticism about formal approaches, but I’m urging some even-handed scepticism about our intuitions. Where the machine and the human disagree, it seems to me a toss-up who, if either, is right.
  3. The assimilation of causal inference to a prediction problem is very useful and one I’ve also explored. It deserves wider appreciation among just about everyone. What would be nice is to see more discussion about prediction under intervention, which, according to some, are categorically different from other kinds. Will machine learning prove capable of making predictions about what will happen under interventions? If so, will this yield causal knowledge as a matter of definition, or could the resulting predictions be generated in a way that is epistemically opaque? Interventionism in philosophy, causal inference in epidemiology, and the “new science of cause and effect” might just see their ideas put to empirical test, if epidemiology picks up ML approaches in coming years. An intervention-supporting predictive algorithm that does not admit of a ready causal interpretation would force a number of books to be rewritten. Of course, according to those books, it should be impossible; but the potency of a priori reasoning about causation is, to say the least, disputed.