Skip to main content
Back to News
NeurIPS Paper Reviews 2024 #8

NeurIPS Paper Reviews 2024 #8

7 February 2025
  • News
  • Quantitative Research

Hugh, Scientific Director

In this paper review series, our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2024.

Here, discover the perspectives of Scientific Director, Hugh.

Better by default: Strong pre-tuned MLPs and boosted trees on tabular data

David Holzmüller, Leo Grinsztajn, Ingo Steinwart

Over the years there have been endless papers benchmarking neural networks and boosted trees, usually concluding that the trees are better for tabular data. This paper revisits this comparison and proposes a bag of tricks for improving simple neural network models.

The tricks include preprocessing methods, new learning rate schedules, additional scaling layers and data-driven weight initialisations. Nothing in this paper is especially groundbreaking, but that is the point: getting all these details right is important to getting good performance.

Another contribution of the paper is to propose a larger benchmark of tabular data, which is larger scale than previous attempts. The paper also compares hyper parameter tuning vs ensembling and all the time is making comparisons based for differing computational budgets.

This is a very practical paper that suggests many sensible ideas that practitioners will find useful.

Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
NeurIPS 2023 Paper Reviews

Read paper reviews from NeurIPS 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data

Kai Helli, David Schnurr, Noah Hollmann, Samuel Müller, Frank Hutter

This paper develops the TabPFN paper in a simple but quite powerful way.

TabPFN is an in-context learning method based on a transformer architecture trained on synthetic data to perform supervised learning. Essentially, it is a model that is trained to invert a data generating processes from samples.

In general, inverting a data generating process is very hard – the Bayesian approach essentially enumerates over all the possible datasets and averages them weighted by how likely the observations are, and only special cases are tractable. In this approach, however, it is the same learning algorithm regardless of the data generating process, so the data can be generated in arbitrarily complex ways.

The original TabPFN paper introduced complexity through random structural causal models. This new paper goes a step further and adds drift in the underlying structure, leading to datasets that change over time (time is given as an input feature) in random ways. The model is then learning to identify the drift in the causal model and make extrapolations that are consistent with this drift. There are very few methods outside very simple linear models that can model this effect.

Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

Ching-An Cheng, Allen Nie, Adith Swaminathan

This paper presents a new paradigm for optimisation, where an LLM is used to perform end-to-end optimisation over code.

One way to think about it is generalising automatic differentiation to general non-numeric inputs: the ‘gradient’ (float valued) of usual AutoDiff is translated to ‘feedback for improvement’ (string valued). The key motivation is to use rich feedback, rather than simply a scalar gradient.

In the usual RL environment the only feedback is the scalar reward, whereas here the feedback could be a stack-trace or human annotation or anything else. The technical contribution of the paper is an algorithm for presenting the optimisation components to the LLM in an efficient way, rather than for example calling a pseudo Jacobian vector product for each operation (though they do implement this as well).

The thing that most excites me about this paper is that it uses LLMs in a loop with some non-LLM parts, for example execute code and evaluate the objective. This is very powerful as there is an anchor to make the LLM do something reasonable and if it does hallucinate some nonsense, the nonsense gets discarded by the non-LLM parts.

One view is to see the LLM as a general purpose computer and this method as a way to harness that. Another perspective is to see LLMs as a general purpose prior and this is just an algorithm that uses this prior in an RL loop.

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

Read more paper reviews

ICML 2024: Paper Review #1

Discover the perspectives of Casey, one of our Machine Learning Engineer, on the following papers:

  • Towards scalable and stable parallelization of nonlinear RNNs
  • logarithmic math in accurate and efficient AI inference accelerators
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Trenton, one of our Software Engineer, on the following papers:

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
  • Parallelizing Linear Transformers with the Delta Rule over Sequence Length
  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Mark, one of our Senior Quantitative Researcher, on the following papers:

  • Why Transformers Need Adam: A Hessian Perspective
  • Poisson Variational Autoencoder
  • Noether’s Razor: Learning Conserved Quantities
Read now
ICML 2024: Paper Review #4

Discover the perspectives of Angus, one of our Machine Learning Engineer, on the following papers:

  • einspace: Searching for Neural Architectures from Fundamental Operations
  • SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
  • An Image is Worth 32 Tokens for Reconstruction and Generation
  • Dimension-free deterministic equivalents and scaling laws for random feature regression
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Georg, one of our Quant Research Manager, on the following papers:

  • Optimal Parallelization of Boosting
  • Learning Formal Mathematics From Intrinsic Motivation
  • Learning on Large Graphs using Intersecting Communities
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Cedric, one of our Quantitative Researchers, on the following papers:

  • Preference Alignment with Flow Matching
  • A Generative Model of Symmetry Transformations
Read now
ICML 2024: Paper Review #9

Discover the perspectives of Andrew, one of our Quant Research Managers, on the following papers:

  • Algorithmic Capabilities of Random Transformers
  • The Road Less Scheduled
  • Time Series in the Age of Large Models
Read now
ICML 2024: Paper Review #10

Discover the perspectives of Julian, one of our Quantitative Researchers, on the following papers:

  • Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
  • Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
  • Amortized Planning with Large-Scale Transformers: A Case Study on Chess
Read now

Latest News

Invisible Work of OpenStack: Eventlet Migration
  • 25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article
SXSW 2025: Key takeaways from our Engineers
  • 24 Mar 2025

At G-Research we stay at the cutting edge by prioritising learning and development. That’s why we encourage our people to attend events like SXSW, where they can engage with industry experts and explore new ideas. Hear from two Dallas-based Engineers, as they share their key takeaways from SXSW 2025.

Read article
G-Research February 2025 Grant Winners
  • 17 Mar 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our February grant winners.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

Women in Quant Finance

15 Jun 2025 - 16 Jun 2025 1 Soho Place, London, W1D 3BG
  • Quantitative Engineering
  • Quantitative Research

Pub Quiz: Paris

15 May 2025 Paris - to be confirmed after registration
  • Quantitative Engineering
  • Quantitative Research

Stanford Quant Challenge

30 Apr 2025 Sheraton Palo Alto Hotel, 625 El Camino Real, Palo Alto, CA 94301, US

Stay up to date with
G-Research