ICML 2024: Paper Review #2

24 September 2024

Quantitative research

Machine Learning (ML) is a fast evolving discipline, which means conference attendance and hearing about the very latest research is key to the ongoing development and success of our quantitative researchers and ML engineers.

In this paper review series, our ICML 2024 attendees reveal the research and papers they found most interesting.

Here, discover the perspectives of Machine Learning Engineer, Danny, as he discusses his most compelling findings from the conference.

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

In this paper, the authors seek to find more compute-efficient alternatives to replace dense linear layers, investigating structured alternatives such as low-rank matrices, monarch matrices and Kronecker products.

The authors claim that these approaches have largely failed in the past due to choosing hyperparameters poorly when these alternatives are used in place of dense linear layers.

To address this, they adapt the initialisation scheme derived from the maximal update parametrization work to support with these structured matrices, and use it to optimise some simple hyperparameters (like learning rate). They show that by doing this, they are able to achieve better test performance per flop on a number of tasks.

Compute Better Spent: Replacing Dense Layers with Structured Matrices

ICML 2023 Paper Reviews

Read paper reviews from ICML 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Emergent Equivariance in Deep Ensembles

Jan E. Gerken and Pan Kessel

In this work, the authors use the theory of neural tangent kernels to prove that ensembles of infinitely wide deep neural networks are equivariant at all stages of training if trained with full data augmentation.

The authors then demonstrate this property empirically for ensembles of wide and deep neural networks applied to several image classification tasks. In these cases, the neural network ensemble becomes equivariant to relevant symmetries in the data even when the underlying members of the ensemble do not display this property.

Emergent Equivariance in Deep Ensembles