Skip to main content

NeurIPS paper reviews 2024 #5

23 January 2025
  • News
  • Quantitative research

Dustin, Scientific Director

In this paper review series, our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2024.

Here, discover the perspectives of Scientific Director, Dustin.

QuaRot: Outlier-free 4-bit inference in rotated LLMs

Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman

Large language models (LLMs) often have billions of parameters, typically stored as 16-bit floating-point numbers. Quantising these weights to lower precision (e.g. 4-bit integers) offers significant advantages, including reduced memory usage, lower computational requirements and improved energy efficiency.

In the extreme case, models like BitNet [1] store weights as 1.58-bit values {-1, 0, 1}. However, LLMs’ weight matrices often contain large outliers, making quantisation challenging. In this paper, the authors propose a novel approach using randomised Hadamard transformations (rotations) to preprocess the weight matrices. These rotations effectively remove outliers, enabling more efficient quantisation.

Using the GPTQ algorithm [2], which quantises without requiring model retraining, the authors achieve end-to-end 4-bit quantisation. They demonstrate that their approach preserves model performance (e.g., minimal increase in text perplexity) even for large models like LLaMA2-70B, which exhibit the smallest performance drop. Furthermore, their method delivers a 3x speedup during inference and a 3.5x reduction in peak memory usage.

[1] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2024), https://arxiv.org/abs/2402.17764

[2] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers (2022), https://arxiv.org/abs/2210.17323

QuaRot: Outlier-free 4-bit inference in rotated LLMs
NeurIPS 2023 paper reviews

Read paper reviews from NeurIPS 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

An image is worth 32 tokens for reconstruction and generation

Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

Generative image models encode images as tokens in a compact latent representation. For example, a 256x256x3 pixel image is typically broken in to 8×8 pixel patches, and encoded as 32×32 = 1024 16-dimensional vectors. The standard approach assumes that the 2D arrangement of patches is important for tokenisation.

In this paper, motivated by previous successes of 1D models applied to 2D images, the authors replace the 2D patch encoding with a smaller 1D latent representation, which is trained using a Vision Transformer model [3].

The 16-dimensional tokens are encoded as 10-bit integers using a vector quantisation (VQ) algorithm with a learnable code book. The authors show good performance with a 1D latent space of only 32 tokens, giving up to a 400x speedup for image generation over current models.

[3] Taming Transformers for High-Resolution Image Synthesis (2020), https://arxiv.org/abs/2012.09841

An image is worth 32 tokens for reconstruction and generation
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Dimension-free deterministic equivalents and scaling laws for random feature regression

Leonardo Defilippis, Bruno Loureiro, Theodor Misiakiewicz

Recent work has shown the surprising result that overparametrised neural networks (with number of features p > number of data points n) exhibit “double descent” [4], and can achieve both zero training set error and low test set error. This phenomenon is also present in simple linear models e.g. random feature ridge regression (RFRR).

In this paper, the authors consider RFRR and derive a formula for the test set error as a sum of bias and variance terms. They identify different regimes where either bias or variance dominate. Finally, they improve the lower bound on the minimum number of features required from p* = O(n) to p* = n^q* where the exponent q* depends on the power law distribution of the features and data. Their formula shows impressive agreement with both simulated data and real-world data from the FashionMNIST dataset.

[4] Two models of double descent for weak features (2019), https://arxiv.org/abs/1903.07571

Dimension-free deterministic equivalents and scaling laws for random feature regression

Read more paper reviews

ICML 2024: Paper review #1

Discover the perspectives of Casey, one of our Machine Learning Engineer, on the following papers:

  • Towards scalable and stable parallelization of nonlinear RNNs
  • logarithmic math in accurate and efficient AI inference accelerators
Read now
ICML 2024: Paper review #2

Discover the perspectives of Trenton, one of our Software Engineer, on the following papers:

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
  • Parallelizing Linear Transformers with the Delta Rule over Sequence Length
  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Read now
ICML 2024: Paper review #3

Discover the perspectives of Mark, one of our Senior Quantitative Researcher, on the following papers:

  • Why Transformers Need Adam: A Hessian Perspective
  • Poisson Variational Autoencoder
  • Noether’s Razor: Learning Conserved Quantities
Read now
ICML 2024: Paper review #4

Discover the perspectives of Angus, one of our Machine Learning Engineer, on the following papers:

  • einspace: Searching for Neural Architectures from Fundamental Operations
  • SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Read now
ICML 2024: Paper review #6

Discover the perspectives of Georg, one of our Quant Research Manager, on the following papers:

  • Optimal Parallelization of Boosting
  • Learning Formal Mathematics From Intrinsic Motivation
  • Learning on Large Graphs using Intersecting Communities
Read now
ICML 2024: Paper review #7

Discover the perspectives of Cedric, one of our Quantitative Researchers, on the following papers:

  • Preference Alignment with Flow Matching
  • A Generative Model of Symmetry Transformations
Read now
ICML 2024: Paper review #8

Discover the perspectives of Hugh, one of our Scientific Director, on the following papers

  • Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
  • Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
Read now
ICML 2024: Paper review #9

Discover the perspectives of Andrew, one of our Quant Research Managers, on the following papers:

  • Algorithmic Capabilities of Random Transformers
  • The Road Less Scheduled
  • Time Series in the Age of Large Models
Read now
ICML 2024: Paper review #10

Discover the perspectives of Julian, one of our Quantitative Researchers, on the following papers:

  • Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
  • Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
  • Amortized Planning with Large-Scale Transformers: A Case Study on Chess
Read now

Latest events

  • Platform engineering
  • Software engineering

DurHack X

01 Nov 2025 - 02 Nov 2025 University of Durham, Teaching and Learning Centre, South Road, Durham, DH1 3LS
  • Platform engineering
  • Software engineering

Hack the Burgh

01 Nov 2025 - 02 Nov 2025 The Nucleus Building, The University of Edinburgh, Thomas Bayes Road, Edinburgh, UK
  • Platform engineering
  • Software engineering

Cambridge coding challenge

29 Oct 2025 University of Cambridge, West Hub

Stay up to date with G-Research