Skip to main content
Back to News
NeurIPS Paper Reviews 2024 #4

NeurIPS Paper Reviews 2024 #4

23 January 2025
  • News
  • Quantitative Research

Angus, Machine Learning Engineer

In this paper review series, our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2024.

Here, discover the perspectives of Machine Learning Engineer, Angus.

einspace: Searching for Neural Architectures from Fundamental Operations

Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. Crowley

While the transformer architecture has dominated much of the machine learning landscape in recent years, new model architectures always hold the promise of improving accuracy or efficiency on a particular machine learning task.

This paper introduces a new architecture search space (known as einspace, by analogy to the einsum operation) over which to perform automated neural architecture search (NAS). The search space is defined by a parameterised probabilistic context-free grammar over a range of branching, aggregation, routing and computation operations.

While many NAS search spaces are either overly restrictive or require the search algorithm to reinvent basic operations from principles, einsum is flexible enough to encode many popular architectures including ResNets, transformers and MLP-Mixer while still retaining relatively high-level building blocks.

Using a simple evolutionary algorithm that mutates the best-performing architectures in a population according to the production rules of the probabilistic context-free grammar, einspace was competitive with many more complex NAS methods, and proved especially effective when used to mutate existing state-of-the-art architectures, improving these architectures on almost every task the authors tested.

It will be interesting to see how einspace performs when more sophisticated and efficient search algorithms are developed for it, and whether the search space can be further extended to include recurrent network architectures. If nothing else, the paper is worth looking at just for all of the pretty pictures of different networks’ einspace representations.

einspace: Searching for Neural Architectures from Fundamental Operations
NeurIPS 2023 Paper Reviews

Read paper reviews from NeurIPS 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

Taisuke Yasuda, Kyriakos Axiotis, Gang Fu, MohammadHossein Bateni, Vahab Mirrokni

Neural network pruning is a technique for improving the efficiency and generalisation ability of neural networks by replacing dense parameter tensors with sparse approximations.

With traditional sparsification techniques, we are not guaranteed any kind of structure in the resulting sparse matrix, but most inference hardware is unable to efficiently utilise unstructured sparsity in computations, so while in principle fewer FLOPs are required, in practice these efficiency gains cannot be realised.

In contrast, block sparsification involves masking out contiguous blocks in parameter tensors, so that parallel computing primitives such as CUDA thread blocks can be mapped directly onto the resulting sparse block structure, leading to much more tangible performance gains.

As the title suggests, this paper introduces a framework for combining differentiable pruning methods with combinatorial optimisation algorithms, where the former method is used for determining important entries in weight matrices, and the latter for iteratively constructing the block sparse matrix based on these importance scores.

Using this approach, the authors show that a wide variety of differentiable pruning techniques can be viewed as nonconvex regularisers that generalise the group LASSO, and that a wide class of such regularisers have unique solutions that coincide with the solution of a corresponding group LASSO problem.

Furthermore, based on this approach, the authors propose a new block pruning algorithm called SequentialAttention++, which fuses the Sequential Attention differentiable pruning technique with an algorithm called ACDC, which alternates between dense and sparse training phases to sparsify the weight matrices while allowing the sparse support to vary throughout training in case it is chosen sub-optimally early in training.

In the authors’ experiments, SequentialAttention++ consistently outperforms a number of other sparsification methods on both the ImageNet and Criterio datasets in terms of both validation loss and accuracy, across sparsities ranging from 90% to 99%.

Moreover, SequentialAttention++ performs particularly well against to the baselines for large block sizes and high sparsities, and this is the regime in which hardware efficiency improvements are the most pronounced, making this approach very attractive for real-world scenarios where model size or inference latency are critical.

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more paper reviews

ICML 2024: Paper Review #1

Discover the perspectives of Casey, one of our Machine Learning Engineer, on the following papers:

  • Towards scalable and stable parallelization of nonlinear RNNs
  • logarithmic math in accurate and efficient AI inference accelerators
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Trenton, one of our Software Engineer, on the following papers:

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
  • Parallelizing Linear Transformers with the Delta Rule over Sequence Length
  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Mark, one of our Senior Quantitative Researcher, on the following papers:

  • Why Transformers Need Adam: A Hessian Perspective
  • Poisson Variational Autoencoder
  • Noether’s Razor: Learning Conserved Quantities
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
  • An Image is Worth 32 Tokens for Reconstruction and Generation
  • Dimension-free deterministic equivalents and scaling laws for random feature regression
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Georg, one of our Quant Research Manager, on the following papers:

  • Optimal Parallelization of Boosting
  • Learning Formal Mathematics From Intrinsic Motivation
  • Learning on Large Graphs using Intersecting Communities
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Cedric, one of our Quantitative Researchers, on the following papers:

  • Preference Alignment with Flow Matching
  • A Generative Model of Symmetry Transformations
Read now
ICML 2024: Paper Review #8

Discover the perspectives of Hugh, one of our Scientific Director, on the following papers:

  • Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
  • Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
Read now
ICML 2024: Paper Review #9

Discover the perspectives of Andrew, one of our Quant Research Managers, on the following papers:

  • Algorithmic Capabilities of Random Transformers
  • The Road Less Scheduled
  • Time Series in the Age of Large Models
Read now
ICML 2024: Paper Review #10

Discover the perspectives of Julian, one of our Quantitative Researchers, on the following papers:

  • Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
  • Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
  • Amortized Planning with Large-Scale Transformers: A Case Study on Chess
Read now

Latest News

G-Research 2024 PhD prize winners: SOCINT
  • 11 Mar 2025

Every year, G-Research runs a number of different PhD prizes in Maths and Data Science at universities in the UK, Europe and beyond. We're pleased to announce the winners of this prize, run in conjunction with Società Italiana di Intelligence.

Read article
G-Research Scholarships: We’re fully funding 42 PhD students
  • 25 Feb 2025

We’re thrilled to announce the launch of a brand-new Scholarships programme, fully-funding 42 PhD students across the UK through our NextGen programme.

Read article
G-Research January 2025 Grant Winners
  • 24 Feb 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our January grant winners.

Read article

Latest Events

  • Platform Engineering
  • Software Engineering

Warsaw Coding Challenge

18 Mar 2025 Hotel Bristol, Krakowskie Przedmiescie 42/44, 00-325 Warsaw
  • Platform Engineering
  • Software Engineering

Belgrade Coding Challenge

20 Mar 2025 Saint Ten Hotel, Svetog Save 10, Beograd 11000, Serbia
  • Quantitative Engineering
  • Quantitative Research

London Quant Challenge

19 Mar 2025 G-Research, 1 Soho Place, London, W1D 3BG

Stay up to date with
G-Research