Skip to main content
Back to News
NeurIPs Paper Reviews 2023 #3

NeurIPs Paper Reviews 2023 #3

23 January 2024
  • Quantitative Research

Our team of quantitative researchers have shared the most interesting research presented during workshops and seminars at NeurIPs 2023.

Discover the perspectives of quantitative researcher Szymon, as he discusses his most compelling findings from the conference.

G-Research NeurIPs Booth

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon

Linear State Space Models are a classical tool in Signal Processing and Control Theory. They also became in recent years an object of interest for Machine Learning researchers, when it was discovered that composing properly structured and initialized models of this type with nonlinear activations results in a new type of scalable and efficient sequential architecture. Such architectures, which can be thought of as sophisticated linear RNNs, have shown impressive performance on a number of classification tasks requiring processing long sequences, audio processing, and recently state of the art results in language modelling at a scale of few billion parameters. The model proposed in this paper is a type of State Space Model architecture, that has been designed for processing spatiotemporal data.

The key idea is to use the convolution operation in defining the transition, input and output operations to the state space model. This is analogous to using the convoluiton operation to define the step of ConvRNNs like ConvLSTM and ConvGRU, which are commonly used in spatiotemporal modelling. Since convolution is a linear operation, such definition results in a linear state space model with a distinct structure. The authors opt to restrict the state transition convolution to be pointwise, and integrate this design variant into previously proposed S5 layer. Furthermore, they leverage initialization schemes from previous SSM models and implement an efficient parallel scan for the proposed layer. The resulting architecture, which is named ConvS5, combines the fast stateful inference and training time linear in the sequence length of RNN models with parallelizability of transformers. The proposed model either matches or outperforms state of the art models on a number of spatiotemporal benchmarks, while also comparing favourably to them in terms of required computational resources.

Given that video is the next frontier of generative AI, I am looking forward to seeing how well generative video models with a ConvSSM backbone are going to perform.

Convolutional State Space Models for Long-Range Spatiotemporal Modeling
NeurIPS 2022 Paper Reviews

Read paper reviews from NeurIPS 2022 from a number of our quantitative researchers and machine learning practitioners.

Read now

How to Scale Your EMA

Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb

Model EMA is a copy of a given machine learning model, with parameters equal to the exponential moving average along the optimization trajectory of the original model. This object is employed in Machine Learning for a variety of purposes. In Supervised Learning, using model EMA instead of last iterate of training process often improves stability and generalization. It is also commonly used in semi-supervised and self-supervised learning as the teacher model.

A performance-critical hyperparameter of model EMA is the momentum, which is a number between zero and one chosen by the practitioner. In the era of training very large models, hyperparameter tuning is performed on smaller models and then hyperparameters for the larger model are chosen based on the result of this tuning, using appropriate scaling rules. The authors of this article derive a practical and theoretically grounded guideline on how the EMA momentum hyperparameter should be chosen when the batch size is scaled. The key result is that as the batch size is multiplied by K, the EMA momentum should be raised to the power K.

The authors derive this scaling rule by analysing the limiting SDE corresponding to dynamics of SGD with various batch sizes, and validate it by throughout empirical testing on several tasks. The empirical results are especially impressive, as the authors are able to recover almost perfectly matching training curves for most of the supervised learning experiments, even when the batch size is scaled by a factor of 256. For the pseudo-labelling problem and self-supervised learning problem recovering the training dynamics turns out to be more difficult, as it is hard to replicate the training dynamics of the base model in the early phases, especially at large batch size. The authors manage to recover a very close replication anyway, with simple interventions.

I really appreciated this paper for providing an elegant, easy to understand and implement result with broad applicability.

How to Scale Your EMA

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

NeurIPs Paper Reviews 2023 #1

Discover the perspectives of Danny, one of our machine learning engineers, on the following papers:

  • A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
  • Normalization Layers Are All That Sharpness-Aware Minimization Needs
Paper Review #1
NeurIPs Paper Reviews 2023 #2

Discover the perspectives of Paul, one of our quantitative researchers, on the following papers:

  • Sharpness-Aware Minimization Leads to Low-Rank Features
  • When Do Neural Nets Outperform Boosted Trees on Tabular Data?
Paper Review #2
NeurIPS Paper Review 2023 #4

Discover the perspectives of Dustin, our scientific director, on the following papers:

  • Abide by the law and follow the flow: conservation laws for gradient flows
  • The Tunnel Effect: Building Data Representations in Deep Neural Networks
Paper Review #4
NeurIPS Paper Review 2023 #5

Discover the perspectives of Laurynas, one of our machine learning engineers, on the following papers:

  • Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
  • QLoRA: Efficient Finetuning of Quantized LLMs
Paper Reviews #5
NeurIPS Paper Review 2023 #6

Discover the perspectives of Rui, one of our quantitative analysts, on the following papers:

  • Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
  • Conformal Prediction for Time Series with Modern Hopfield Networks
Paper Review #6

Latest News

Invisible Work of OpenStack: Eventlet Migration
  • 25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article
SXSW 2025: Key takeaways from our Engineers
  • 24 Mar 2025

At G-Research we stay at the cutting edge by prioritising learning and development. That’s why we encourage our people to attend events like SXSW, where they can engage with industry experts and explore new ideas. Hear from two Dallas-based Engineers, as they share their key takeaways from SXSW 2025.

Read article
G-Research February 2025 Grant Winners
  • 17 Mar 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our February grant winners.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

MPP/MPQ Career Day

30 Apr 2025 Max Planck Institute for Physics, Boltzmannstraße 8, 85748 Garching bei München, Germany
  • Quantitative Engineering
  • Quantitative Research

Imperial PhD Careers Fair

10 Jun 2025 Queen's Tower Rooms, Sherfield Building, South Kensington Campus, Imperial College London, London, SW7 2AZ
  • Platform Engineering
  • Software Engineering

Oxbridge Women in Computer Science Conference

03 May 2025 The William Gates Building 15 JJ Thomson Avenue, Cambridge, CB3 0FD

Stay up to date with
G-Research