Skip to main content
Back to News
NeurIPS Paper Reviews 2024 #1

NeurIPS Paper Reviews 2024 #1

23 January 2025
  • News
  • Quantitative Research

Casey, Machine Learning Engineer

In this paper review series, our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2024.

Here, discover the perspectives of Machine Learning Engineer, Casey.

Towards scalable and stable parallelization of nonlinear RNNs

Xavier Gonzalez, Andrew Warrington, Jimmy T.H. Smith, Scott W. Linderman

This paper builds on previous work, which showed that a non-linear RNN forward pass of length L does not need to be evaluated sequentially. Instead, all L states can be evaluated concurrently.

By formulating the evaluation of the RNN as a non-linear least squares problem, a starting state can be given and the remaining states can be iteratively refined in parallel by minimizing the objective, as shown in Figure 1 from the original paper.

Figure 1 from Parallelizing Non-linear sequential models over the sequence length

The authors of this paper noted that the method of solving the non-linear least squares problem in the previous work was unstable and can easily fail to converge.

The method (Gauss-Newton) works by iteratively linearizing the non-linear residual around the current solution and solving the least squares problem to get a new estimate of the solution. If the residual function has sufficiently high curvature, this linear approximation will be poor and cause the optimization to fail to converge.

The authors instead propose using the Levenberg-Marquardt algorithm which adds a “trust region constraint” that stabilizes the objective. This augments the objective with a penalty for solutions too far away from the previous one. Essentially, it says that the linearization is only accurate within a certain radius of the previous solution. This prevents the optimization from taking steps that are too large and causing the objective to diverge.

The authors also propose quasi-newton methods by noting the particular structure of the matrix involved in the optimization problem allows them to approximate the Jacobian and thus obtain much faster optimization.

RNNs are a classic example of inherently sequential models, so it is notable that algorithms from optimization can reframe the problem so fundamentally.

Towards Scalable and Stable Parallelization of Nonlinear RNNs
NeurIPS 2023 Paper Reviews

Read paper reviews from NeurIPS 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Logarithmic math in accurate and efficient AI inference accelerators

This industry talk discussed innovative approaches to reducing power consumption in machine learning hardware.

The central premise revolves around the significant power and area efficiency of additions compared to multiplications: a 16-bit addition uses 22 times less power and 25 times less chip area than a 16-bit floating-point multiplication. This difference is critical as modern machine learning workloads are heavily dominated by computations of the form ab + c, commonly known as “multiply-accumulate” (MAC).

The speaker proposed an alternative approach to floating-point arithmetic, leveraging a logarithmic number system (LNS) to achieve similar numerical accuracy while drastically reducing power consumption. The proposed 8-bit LNS format, illustrated below, consists of 1 sign bit, 4 integer bits, and 3 decimal bits.

Figure 2: 8 bit logarithmic number has 1 sign bit, 4 integer bits and 3 decimal bits. In contrast to floating point, it has a fixed precision.

To compute the logarithmic equivalent of a multiply-accumulate operation (i.e. lg(ab+c) using three numbers \lg a, \lg b, \lg c, specific mathematical properties enable efficient calculations in log space:

  • logarithm of a product: \lg ab = \lg a + \lg b
  • Mitchell approximation: \lg x \approx x for 0 \le x \le 1. This approximates the binary logarithm with its secant line.

Combining these properties, the logarithmic MAC can be reduced to a sequence integer additions and bit shifts.

The adoption of LNS in machine learning hardware would represent a fundamental shift in how computations are performed. This approach contrasts sharply with the dominant paradigm used in GPUs, such as those developed by Nvidia, which rely on floating-point arithmetic for neural network computations. As AI models continue to proliferate across industries, innovations like LNS-based hardware could play a pivotal role in shaping the future of sustainable AI.

Learn more
Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more paper reviews

ICML 2024: Paper Review #2

Discover the perspectives of Trenton, one of our Software Engineers, on the following papers:

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
  • Parallelizing Linear Transformers with the Delta Rule over Sequence Length
  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Mark, one of our Senior Quantitative Researcher, on the following papers:

  • Why Transformers Need Adam: A Hessian Perspective
  • Poisson Variational Autoencoder
  • Noether’s Razor: Learning Conserved Quantities
Read now
ICML 2024: Paper Review #4

Discover the perspectives of Angus, one of our Machine Learning Engineers, on the following papers:

  • einspace: Searching for Neural Architectures from Fundamental Operations
  • SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
  • An Image is Worth 32 Tokens for Reconstruction and Generation
  • Dimension-free deterministic equivalents and scaling laws for random feature regression
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Georg, one of our Quant Research Manager, on the following papers:

  • Optimal Parallelization of Boosting
  • Learning Formal Mathematics From Intrinsic Motivation
  • Learning on Large Graphs using Intersecting Communities
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Cedric, one of our Quantitative Researchers, on the following papers:

  • Preference Alignment with Flow Matching
  • A Generative Model of Symmetry Transformations
Read now
ICML 2024: Paper Review #8

Discover the perspectives of Hugh, one of our Scientific Director, on the following papers:

  • Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
  • Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
Read now
ICML 2024: Paper Review #9

Discover the perspectives of Andrew, one of our Quant Research Managers, on the following papers:

  • Algorithmic Capabilities of Random Transformers
  • The Road Less Scheduled
  • Time Series in the Age of Large Models
Read now
ICML 2024: Paper Review #10

Discover the perspectives of Julian, one of our Quantitative Researchers, on the following papers:

  • Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
  • Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
  • Amortized Planning with Large-Scale Transformers: A Case Study on Chess
Read now

Latest News

Invisible Work of OpenStack: Eventlet Migration
  • 25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article
SXSW 2025: Key takeaways from our Engineers
  • 24 Mar 2025

At G-Research we stay at the cutting edge by prioritising learning and development. That’s why we encourage our people to attend events like SXSW, where they can engage with industry experts and explore new ideas. Hear from two Dallas-based Engineers, as they share their key takeaways from SXSW 2025.

Read article
G-Research February 2025 Grant Winners
  • 17 Mar 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our February grant winners.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

MPP/MPQ Career Day

30 Apr 2025 Max Planck Institute for Physics, Boltzmannstraße 8, 85748 Garching bei München, Germany
  • Quantitative Engineering
  • Quantitative Research

Imperial PhD Careers Fair

10 Jun 2025 Queen's Tower Rooms, Sherfield Building, South Kensington Campus, Imperial College London, London, SW7 2AZ
  • Platform Engineering
  • Software Engineering

Oxbridge Women in Computer Science Conference

03 May 2025 The William Gates Building 15 JJ Thomson Avenue, Cambridge, CB3 0FD

Stay up to date with
G-Research