Skip to main content
Back to News
NeurIPs Paper Reviews 2023 #4

NeurIPs Paper Reviews 2023 #4

23 January 2024
  • Quantitative Research

Our team of quantitative researchers have shared the most interesting research presented during workshops and seminars at NeurIPs 2023.

Discover the perspectives of our scientific director Dustin as he discusses his most compelling findings from the conference.

Abide by the law and follow the flow: conservation laws for gradient flows

Sibylle Marcotte, Remi Gribonval, Gabriel Peyré

Think of neural network training as a dynamical system obeying the laws of classical mechanics. The loss function L is like a potential energy surface, and the NN weights W follow trajectories of steepest descent according to “laws of motion”, which are defined by a differential equation dW/dt = -k * dL/dW. The authors show that the NN weights obey conservation laws just like conservation of energy in classical mechanics.

For example, for a 1-dimensional, 2-layer ReLU network with two weights u and v, there is one conserved quantity h = u^2 – v^2. This implies that the initial choice of weights is important as the final state is constrained to keep h constant throughout training. This builds on previous work (Zhao 2022) which argues that these conservation laws induce an inductive bias towards “flat” minima of the loss function, which reduces overfitting and makes training more robust.

The paper contains a complicated procedure for computing the conserved quantities for more complicated NNs, but the slides have some nice pictures illustrating the 1-d example. I like it because it is a neat way to understand NN training using ideas from physics. It also suggests that bigger NNs with more parameters might work well.

Abide by the law and follow the flow: conservation laws for gradient flows
NeurIPS 2022 Paper Reviews

Read paper reviews from NeurIPS 2022 from a number of our quantitative researchers and machine learning practitioners.

Read now

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Miłoś, Tomasz Trzcinski 

In a deep 18-layer neural network for image classification, the layers can be divided into two distinct roles. The first 8 layers act as a feature “extractor”, and are responsible for most of the predictive power of the network. The following 10 layers act as a “tunnel”, whose purpose is to compress the intermediate activation vector in to a low-dimensional embedding.

According to the authors, the “extractor” attains >99% of the final prediction accuracy, and that the numerical rank of the weight matrices in the “tunnel” collapses to log(d) where d is the number of output classes. The authors perform a number of experiments: combining the “extractor” trained on one task with the “tunnel” trained on a different task. They show that the “extractor” is task-specific but the “tunnel” is the same for both tasks.

I like it because it is a nice, practical way to understand NN training dynamics, which seems to conclude with a meaningful interpretation. I would be curious to see if this holds for other architectures and datasets. I like the use of intermediate metrics (like numerical rank of intermediate layers) to probe what’s happening during training.

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

NeurIPs Paper Reviews 2023 #1

Discover the perspectives of Danny, one of our machine learning engineers, on the following papers:

  • A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
  • Normalization Layers Are All That Sharpness-Aware Minimization Needs
Paper Review #1
NeurIPs Paper Reviews 2023 #2

Discover the perspectives of Paul, one of our quantitative researchers, on the following papers:

  • Sharpness-Aware Minimization Leads to Low-Rank Features
  • When Do Neural Nets Outperform Boosted Trees on Tabular Data?
Paper Review #2
NeurIPs Paper Reviews 2023 #3

Discover the perspectives of Szymon, one of our quantitative researchers, on the following papers:

  • Convolutional State Space Models for Long-Range Spatiotemporal Modeling
  • How to Scale Your EMA
Paper Review #3
NeurIPS Paper Review 2023 #5

Discover the perspectives of Laurynas, one of our machine learning engineers, on the following papers:

  • Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
  • QLoRA: Efficient Finetuning of Quantized LLMs
Paper Review #5
NeurIPS Paper Review 2023 #6

Discover the perspectives of Rui, one of our quantitative analysts, on the following papers:

  • Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
  • Conformal Prediction for Time Series with Modern Hopfield Networks
Paper Review #6

Latest News

James Maynard on Prime Numbers: Cryptography, Twin Primes and Groundbreaking Discoveries
  • 19 Dec 2024

We were thrilled to welcome James Maynard, Fields Medallist 2022 and Professor of Number Theory, at the Mathematical Institute in Oxford, on stage for the latest Distinguished Speaker Symposium last month. James’ talk on Patterns in prime numbers hones in on unanswered questions within mathematics and the recent developments that have brought the solutions to those problems closer to reality. Hear more in his exclusive interview with us.

Read article
Going 15 Percent Faster with Graph-Based Type-checking (part one)
  • 19 Dec 2024

Hear from Florian, Open-Source Software Engineer, on the challenges and breakthroughs behind Project Velocity, an internal initiative aimed at enhancing the .NET developer experience.

Read article
Cliff Cocks on the Origins of Public Key Cryptography
  • 18 Dec 2024

Cliff Cocks – instrumental to the development of public key cryptography during his time at GCHQ – was the first of our speakers at the latest Distinguished Speaker Symposium. Learn more in his exclusive interview with us.

Read article

Latest Events

  • Technology Innovation and Open Source

Open UK: State of Open Con 2025

04 Feb 2025 - 05 Feb 2025 Sancroft, Rose St, Paternoster Sq., St Paul's London EC4M 7DQ
  • Quantitative Research

Italian PhD Prize Award Ceremony 2025

22 Jan 2025 - 24 Jan 2025 Palazzo Madama, 00186 Roma RM, Italy
  • Data Science

Seminar: MPhil in Data Intensive Science – University of Cambridge

13 Feb 2025 The Old Schools, Trinity Lane, Cambridge CB2 1TN

Stay up to date with
G-Research