Skip to main content
Back to News
NeurIPS Paper Reviews 2024 #9

NeurIPS Paper Reviews 2024 #9

7 February 2025
  • News
  • Quantitative Research

Andrew, Quant Research Manager

In this paper review series, our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2024.

Here, discover the perspectives of Quant Research Manager, Andrew.

Algorithmic Capabilities of Random Transformers

Ziqian Zhong, Jacob Andreas

The authors present interesting results that shed some light on the inductive biases inherent in the transformer architecture. By learning only the embedding and readout layers on randomly initialised transformers, they are able to train remarkably strong models on a variety of algorithmic tasks including modular addition and parenthesis balancing, which suggests that core elements of these algorithms are available at initialisation (reminiscent of the “lottery ticket” hypothesis that has been applied to deep neural networks).

The authors investigate learning which components among the embedding, positional encoding and readout layers is critical for success, and show the result is task-dependent. For example, decimal addition requires only the embedding layer to be learned, while modular addition also needs learned weights in the readout layer.

In their restricted setting, the authors find that embedding layers in all cases select a low dimensional subspace for downstream processing and this seems to be adequate to achieve perfection on some tasks.

For more complex problems (such as general language modelling) it appears that broader use of the transformer’s latent space is required, which can only be achieved effectively by training the internal weights.

It remains to be investigated how solutions for the restricted models relate to those in which all weights are learned and how training dynamics arrive at those solutions.

Algorithmic Capabilities of Random Transformers
NeurIPS 2023 Paper Reviews

Read paper reviews from NeurIPS 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

The Road Less Scheduled

Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

The authors introduce a compelling new optimisation procedure that dispenses with any requirement to define a learning rate schedule or specify the number of training epochs in advance and needs no additional hyperparameters. Across a wide range of tasks their schedule-free approach matches or exceeds the performance of Adam with tuned cosine schedule and typically dominates it across the entire duration of training.

The procedure maintains three sequences (one of which can be reconstructed from the other two, hence requiring no additional space beyond standard momentum methods): the first tracks where to evaluate gradients, the second where the base optimiser update is performed and the third is a running average defining our best estimate so far. The authors demonstrate that the procedure can be viewed as a novel interpolation between two forms of averaging from the literature (neither of which in isolation gives results competitive with learned schedule training).

Most researchers use only the final iterate from their training runs, which in practice is found to perform best but conflicts with optimisation theory. By successfully deriving a strong optimisation algorithm that uses an average of iterates, the authors have finally reconciled theory and practice. There have been a number of optimisers over the past ten years touted as successors to Adam that have largely failed to fulfil their promise; this work looks among the strongest candidates yet to take the crown.

The Road Less Scheduled
Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Time Series in the Age of Large Models

This invited talk was one of the most entertaining and thought provoking of the conference, in which the speaker explored with refreshing scepticism the topic of foundation models for time series.

By digging into the specifics of certain standard data sets commonly used for model assessment, but where the target horizons lie far beyond what could plausibly be forecast (such as fine grained weather in a year’s time), he developed trivial baseline models able to outperform the latest complex architectures.

Such tasks can offer very limited guidance in assessing the merits of competing methods yet are routinely presented to demonstrate state of the art. It was a timely reminder as the field progresses that generic time series models need high quality and heterogeneous data for effective training and evaluation and that their utility is ultimately predicated on real world performance.

Of course, this is less of a problem at G-Research: our quants enjoy a huge abundance of diverse data and our forecasting problem, firmly grounded in reality, is among the most challenging – and most rewarding – anywhere on the planet!

Time Series in the Age of Large Models

Read more paper reviews

ICML 2024: Paper Review #1

Discover the perspectives of Casey, one of our Machine Learning Engineer, on the following papers:

  • Towards scalable and stable parallelization of nonlinear RNNs
  • logarithmic math in accurate and efficient AI inference accelerators
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Trenton, one of our Software Engineer, on the following papers:

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
  • Parallelizing Linear Transformers with the Delta Rule over Sequence Length
  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Mark, one of our Senior Quantitative Researcher, on the following papers:

  • Why Transformers Need Adam: A Hessian Perspective
  • Poisson Variational Autoencoder
  • Noether’s Razor: Learning Conserved Quantities
Read now
ICML 2024: Paper Review #4

Discover the perspectives of Angus, one of our Machine Learning Engineer, on the following papers:

  • einspace: Searching for Neural Architectures from Fundamental Operations
  • SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
  • An Image is Worth 32 Tokens for Reconstruction and Generation
  • Dimension-free deterministic equivalents and scaling laws for random feature regression
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Georg, one of our Quant Research Manager, on the following papers:

  • Optimal Parallelization of Boosting
  • Learning Formal Mathematics From Intrinsic Motivation
  • Learning on Large Graphs using Intersecting Communities
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Cedric, one of our Quantitative Researchers, on the following papers:

  • Preference Alignment with Flow Matching
  • A Generative Model of Symmetry Transformations
Read now
ICML 2024: Paper Review #8

Discover the perspectives of Hugh, one of our Scientific Directors, on the following papers:

  • Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
  • Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
Read now
ICML 2024: Paper Review #10

Discover the perspectives of Julian, one of our Quantitative Researchers, on the following papers:

  • Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
  • Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
  • Amortized Planning with Large-Scale Transformers: A Case Study on Chess
Read now

Latest News

Invisible Work of OpenStack: Eventlet Migration
  • 25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article
SXSW 2025: Key takeaways from our Engineers
  • 24 Mar 2025

At G-Research we stay at the cutting edge by prioritising learning and development. That’s why we encourage our people to attend events like SXSW, where they can engage with industry experts and explore new ideas. Hear from two Dallas-based Engineers, as they share their key takeaways from SXSW 2025.

Read article
G-Research February 2025 Grant Winners
  • 17 Mar 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our February grant winners.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

Women in Quant Finance

15 Jun 2025 - 16 Jun 2025 1 Soho Place, London, W1D 3BG
  • Quantitative Engineering
  • Quantitative Research

Pub Quiz: Paris

15 May 2025 Paris - to be confirmed after registration
  • Quantitative Engineering
  • Quantitative Research

Stanford Quant Challenge

30 Apr 2025 Sheraton Palo Alto Hotel, 625 El Camino Real, Palo Alto, CA 94301, US

Stay up to date with
G-Research