NeurIPs Paper Reviews 2023 #1

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Alicia Curth, Alan Jeffares, Mihaela van der Schaar

The double descent hypothesis is a relatively recent idea that attempts to reconcile modern machine learning “bigger is better” practice with the bias-variance trade off. It states that in the overparameterized regime, the traditional U-shaped test error vs model complexity curve breaks down, and it’s possible to see improved generalization performance by continually increasing the model parameter count. This is referred to as the interpolation region, where the number of model parameters is greater than or equal to the training set size.

In this paper, the authors revisit the results from the original Belkin et al paper, from 2019, which observes a double descent for Random Fourier Feature regression, decision tree ensembles and gradient boosted trees. They claim that in each of these cases, model complexity is increased along multiple axes (for example, splits per tree and number of trees for a tree ensemble) and the double descent appears as an artefact of switching between these axes when increasing model complexity, rather than as a result of reaching the interpolation threshold (number of model parameters == training set size). When test error is plotted against increasing model complexity on any single axis, the traditional U-shaped bias-variance curve is recovered.

They go on to interpret each of these cases as a “smoother” from the classical statistical literature, which allows them to derive the effective number of parameters for each model. They then reproduce and re-plot the results from the original paper against the effective parameter count and recover the U-shaped curve in all cases. The obvious omission is the investigation of the “deep double descent” case which is suggested as the next direction for this work.

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

NeurIPS 2022 Paper Reviews

Read paper reviews from NeurIPS 2022 from a number of our quantitative researchers and machine learning practitioners.

Read now

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Müller, Tiffany Vlaar, David Rolnick, Matthias Hein

Sharpness Aware Minimization (SAM) is a technique that attempts to improve generalization performance by finding loss regions with flatter minima by minimizing the worst case sharpness of the training loss in a neighbourhood around L(w).

In this paper, the authors introduce SAM-ON (SAM-OnlyNorm), which applies SAM to the normalization parameters of a network only. They find that SAM-ON achieves better generalization performance than the original SAM method (applied to all parameters) on ResNet architectures with BatchNorm and Vision Transformers with LayerNorm.

They investigate this further by measuring loss sharpness for both SAM and SAM-ON and find that SAM-ON actually finds regions with sharper minima, despite exhibiting better generalization performance. This supports claims from previous work that the generalization performance of SAM is not solely due to it finding flatter minima.

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Latest News

Invisible Work of OpenStack: Eventlet Migration

25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article

SXSW 2025: Key takeaways from our Engineers

24 Mar 2025

At G-Research we stay at the cutting edge by prioritising learning and development. That’s why we encourage our people to attend events like SXSW, where they can engage with industry experts and explore new ideas. Hear from two Dallas-based Engineers, as they share their key takeaways from SXSW 2025.

Read article

G-Research February 2025 Grant Winners

17 Mar 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our February grant winners.

Read article

Latest Events

Quantitative Engineering
Quantitative Research

MPP/MPQ Career Day

30 Apr 2025 Max Planck Institute for Physics, Boltzmannstraße 8, 85748 Garching bei München, Germany

More info

Quantitative Engineering
Quantitative Research

Imperial PhD Careers Fair

10 Jun 2025 Queen's Tower Rooms, Sherfield Building, South Kensington Campus, Imperial College London, London, SW7 2AZ

NeurIPs Paper Reviews 2023 #1

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Quantitative Research and Machine Learning

Read more of our quantitative researchers thoughts

Latest News

Latest Events

MPP/MPQ Career Day

Imperial PhD Careers Fair

Oxbridge Women in Computer Science Conference

Stay up to date with
G-Research

NeurIPs Paper Reviews 2023 #1

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Quantitative Research and Machine Learning

Read more of our quantitative researchers thoughts

Latest News

Latest Events

MPP/MPQ Career Day

Imperial PhD Careers Fair

Oxbridge Women in Computer Science Conference

Stay up to date with G-Research

Stay up to date with
G-Research