Back to news

NeurIPS Paper Reviews 2024 #5

23 January 2025

News
Quantitative Research

Dustin, Scientific Director

In this paper review series, our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2024.

Here, discover the perspectives of Scientific Director, Dustin.

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman

Large language models (LLMs) often have billions of parameters, typically stored as 16-bit floating-point numbers. Quantising these weights to lower precision (e.g. 4-bit integers) offers significant advantages, including reduced memory usage, lower computational requirements and improved energy efficiency.

In the extreme case, models like BitNet ^[1] store weights as 1.58-bit values {-1, 0, 1}. However, LLMs’ weight matrices often contain large outliers, making quantisation challenging. In this paper, the authors propose a novel approach using randomised Hadamard transformations (rotations) to preprocess the weight matrices. These rotations effectively remove outliers, enabling more efficient quantisation.

Using the GPTQ algorithm ^[2], which quantises without requiring model retraining, the authors achieve end-to-end 4-bit quantisation. They demonstrate that their approach preserves model performance (e.g., minimal increase in text perplexity) even for large models like LLaMA2-70B, which exhibit the smallest performance drop. Furthermore, their method delivers a 3x speedup during inference and a 3.5x reduction in peak memory usage.

^[1] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2024), https://arxiv.org/abs/2402.17764

^[2] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers (2022), https://arxiv.org/abs/2210.17323

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

NeurIPS 2023 Paper Reviews

Read paper reviews from NeurIPS 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

An Image is Worth 32 Tokens for Reconstruction and Generation

Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

Generative image models encode images as tokens in a compact latent representation. For example, a 256x256x3 pixel image is typically broken in to 8×8 pixel patches, and encoded as 32×32 = 1024 16-dimensional vectors. The standard approach assumes that the 2D arrangement of patches is important for tokenisation.

In this paper, motivated by previous successes of 1D models applied to 2D images, the authors replace the 2D patch encoding with a smaller 1D latent representation, which is trained using a Vision Transformer model ^[3].

The 16-dimensional tokens are encoded as 10-bit integers using a vector quantisation (VQ) algorithm with a learnable code book. The authors show good performance with a 1D latent space of only 32 tokens, giving up to a 400x speedup for image generation over current models.

^[3] Taming Transformers for High-Resolution Image Synthesis (2020), https://arxiv.org/abs/2012.09841

An Image is Worth 32 Tokens for Reconstruction and Generation

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Dimension-free deterministic equivalents and scaling laws for random feature regression

Leonardo Defilippis, Bruno Loureiro, Theodor Misiakiewicz

Recent work has shown the surprising result that overparametrised neural networks (with number of features p > number of data points n) exhibit “double descent” ^[4], and can achieve both zero training set error and low test set error. This phenomenon is also present in simple linear models e.g. random feature ridge regression (RFRR).

In this paper, the authors consider RFRR and derive a formula for the test set error as a sum of bias and variance terms. They identify different regimes where either bias or variance dominate. Finally, they improve the lower bound on the minimum number of features required from p* = O(n) to p* = n^q* where the exponent q* depends on the power law distribution of the features and data. Their formula shows impressive agreement with both simulated data and real-world data from the FashionMNIST dataset.

^[4] Two models of double descent for weak features (2019), https://arxiv.org/abs/1903.07571

Dimension-free deterministic equivalents and scaling laws for random feature regression

Latest News

G-Research May 2025 Grant Winners

18 Jun 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our May grant winners.

Read article

G-Research 2025 PhD prize winners: University of Warwick

04 Jun 2025

Every year, G-Research runs a number of different PhD prizes in Maths and Data Science at universities in the UK, Europe and beyond. We're pleased to announce the winners of this prize, run in conjunction with the University of Warwick.

Read article

G-Research 2025 PhD prize winners: University of Oxford

29 May 2025

Read article

Latest Events

Quantitative Engineering
Quantitative Research

Pre-ICML @ London 2025

03 Jul 2025 Cruciform Building, Gower Street, London, WC1E 6BT

More info

Quantitative Engineering
Quantitative Research

ML in PL Conference 2025

15 Oct 2025 - 18 Oct 2025 Copernicus Science Centre, Warsaw, Poland

More info

Quantitative Engineering
Quantitative Research

SIAM Conference on Financial Mathematics and Engineering

15 Jul 2025 - 18 Jul 2025 Hyatt Regency Miami, 400 SE 2nd St, Miami, FL 33131, United States

More info

NeurIPS Paper Reviews 2024 #5

Dustin, Scientific Director

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

An Image is Worth 32 Tokens for Reconstruction and Generation

Dimension-free deterministic equivalents and scaling laws for random feature regression

Read more paper reviews

Latest News

Latest Events

Pre-ICML @ London 2025

ML in PL Conference 2025

SIAM Conference on Financial Mathematics and Engineering

Stay up to date with
G-Research

NeurIPS Paper Reviews 2024 #5

Dustin, Scientific Director

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

An Image is Worth 32 Tokens for Reconstruction and Generation

Dimension-free deterministic equivalents and scaling laws for random feature regression

Read more paper reviews

Latest News

Latest Events

Pre-ICML @ London 2025

ML in PL Conference 2025

SIAM Conference on Financial Mathematics and Engineering

Stay up to date with G-Research

Stay up to date with
G-Research