Alan Jeffares

I’m a 4th-year Machine Learning PhD student at the University of Cambridge in the Department of Applied Mathematics. I am interested in building a better understanding of empirical phenomena in deep learning (e.g. double descent, optimization heuristics) and developing methodological advances from these insights (e.g. deep ensembles, mixture-of-experts). I hold an MSc in Machine Learning from University College London and a BSc in Statistics from University College Dublin. Previously, I have interned in the deep learning team at Microsoft Research, Redmond and worked as a Data Scientist at Accenture’s global center for R&D innovation. Email at: aj659 [at] cam [dot] ac [dot] uk.

[twitter] [scholar] [github] [linkedin]

🗞️ News 🗞️

June 2025 → My new paper has been accepted for an oral (🥳) presentation at ICML25. This work argues that many deep learning phenomena (double descent, grokking, lottery tickets) don’t appear in practical applications and, therefore, shouldn’t be treated as puzzles that need to be solved.
May 2025 → I have written a tutorial paper on discrete variational autoencoders. Optimizing through discrete latent spaces has become very popular recently (e.g. mixture-of-experts & VQ-VAE), and VAEs offer a nice introduction to this problem.
September 2024 → New paper accepted at NeurIPS2024! This paper develops a simplified model of a neural network to uncover insights into double descent, grokking, gradient boosting, and linear mode connectivity. Check out our Twitter threads for a bite-sized summary – part I & part II.
June 2024 → Excited to have begun my internship at Microsoft Research Redmond for the summer where I’ll be working on discrete optimization and mixture-of-expert models under the brilliant Lucas Liu and the deep learning team.
May 2024 → New paper accepted at ICML2024! This paper deals with the task of estimating well-calibrated prediction intervals and proposes a simple alternative to quantile regression that relaxes the implicit assumption of a symmetric noise distribution. I will also present “Looking at Deep Learning Phenomena Through a Telescoping Lens” at the HiLD workshop.
Sep 2023 → Two papers accepted for NeurIPS2023! One oral (top 0.5% of submissions) that provides an alternative take on double descent suggesting that it may not be so contradictory from classic statistical notions of model complexity. Then, a poster that investigates if deep ensembles can be trained jointly rather than independently.
Jan 2023 → Two papers accepted! 🥳 One at AISTATS23 ([paper]) and one at ICLR23 ([paper]). These papers explore self-supervised learning for conformal prediction and a new regularizer for neural networks, respectively. I look forward to presenting these with my co-authors!
April 2022 → I have officially started a PhD in Machine Learning in the University of Cambridge under the supervision of Mihaela van der Schaar!
Jan 2022 → First paper accepted! 🎉 Work done during my masters thesis under the supervision of Timos Moraitis and Pontus Stenetorp has been accepted as a spotlight (top 5% of submissions) at ICLR22. This paper took a neuroscience-inspired approach to improve the accuracy-efficiency trade-off in RNNs.
Dec 2021 → Graduated 🎓 I have officially graduated with an MSc in Machine Learning from UCL. I was also a recipient of a Dean’s list award for “outstanding academic performance”.

📚 Selected Research 📚

Please find some of my publications below (a more up-to-date list can be found on google scholar).

“*” denotes equal contribution.

Conference

A. Jeffares, M. van der Schaar. Not All Explanations for Deep Learning Phenomena Are Equally Valuable. ICML, 2025 - Oral (top 2%). [paper]
A. Curth*, A. Jeffares*, M. van der Schaar. A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning. NeurIPS, 2023 - Oral (top 0.5%). [paper] [code]
A. Jeffares, Q. Guo, P. Stenetorp, T. Moraitis. Spike-inspired rank coding for fast and accurate recurrent neural networks. ICLR, 2022 - Spotlight (top 5%). [paper] [code]
A. Jeffares, T. Liu, J. Crabbé, M. van der Schaar. Joint Training of Deep Ensembles Fails Due to Learner Collusion. NeurIPS, 2023 [paper] [code]
A. Jeffares*, A. Curth, M. van der Schaar. Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond. NeurIPS, 2024. [paper]
T. Pouplin*, A. Jeffares*, N. Seedat, M. van der Schaar. Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise. ICML 2024 [paper] [code]
N. Seedat*, A. Jeffares*, F. Imrie, M. van der Schaar. Improving Adaptive Conformal Prediction Using Self-Supervised Learning. AISTATS, 2023 [paper] [code]
A. Jeffares*, T. Liu*, J. Crabbé, F. Imrie, M. van der Schaar. TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization. ICLR, 2023 [paper] [code]
A. Jeffares*, A. Curth, M. van der Schaar. Looking at Deep Learning Phenomena Through a Telescoping Lens. HiLD workshop @ ICML. [paper]

Other

A. Curth, A. Jeffares, M. van der Schaar. Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers. [essay]
A. Jeffares, L. Liu. An Introduction to Discrete Variational Autoencoders. [tutorial]