A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
Alicia Curth, Alan Jeffares, Mihaela van der Schaar
The double descent hypothesis is a relatively recent idea that attempts to reconcile modern machine learning “bigger is better” practice with the bias-variance trade off. It states that in the overparameterized regime, the traditional U-shaped test error vs model complexity curve breaks down, and it’s possible to see improved generalization performance by continually increasing the model parameter count. This is referred to as the interpolation region, where the number of model parameters is greater than or equal to the training set size.
In this paper, the authors revisit the results from the original Belkin et al paper, from 2019, which observes a double descent for Random Fourier Feature regression, decision tree ensembles and gradient boosted trees. They claim that in each of these cases, model complexity is increased along multiple axes (for example, splits per tree and number of trees for a tree ensemble) and the double descent appears as an artefact of switching between these axes when increasing model complexity, rather than as a result of reaching the interpolation threshold (number of model parameters == training set size). When test error is plotted against increasing model complexity on any single axis, the traditional U-shaped bias-variance curve is recovered.
They go on to interpret each of these cases as a “smoother” from the classical statistical literature, which allows them to derive the effective number of parameters for each model. They then reproduce and re-plot the results from the original paper against the effective parameter count and recover the U-shaped curve in all cases. The obvious omission is the investigation of the “deep double descent” case which is suggested as the next direction for this work.
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning