Skip to main content
We're enhancing our site and your experience, so please keep checking back as we evolve.
Back to News
ICML 2024: Paper Review #4

ICML 2024: Paper Review #4

24 September 2024
  • Quantitative Research

Machine Learning (ML) is a fast evolving discipline, which means conference attendance and hearing about the very latest research is key to the ongoing development and success of our quantitative researchers and ML engineers.

In this paper review series, our ICML 2024 attendees reveal the research and papers they found most interesting.

Here, discover the perspectives of Senior Quantitative Researcher, Evgeni, as he discusses his most compelling findings from the conference.

Trained Random Forests Completely Reveal your Dataset

Julien Ferry, Ricardo Fukasawa, Timothée Pascal, Thibaut Vidal

This paper presents a framework for reconstructing tabular training data from a trained random forest model. The approach is based on solving an optimisation problem where the constraints are derived from the random forest’s structure (such as the number of trees and their depth) and the dataset’s characteristics (such as the number of examples and features).

The success of the reconstruction varies depending on the level of bagging randomisation and the nature of the data. For simpler cases, such as when there is little randomisation and there are only binary features, the method can nearly perfectly reconstruct the training data.

However, in more practical scenarios—such as those involving more extensive randomisation, larger datasets or real-valued features—the reconstruction error increases significantly. Further, API-only attacks, where the attacker has limited visibility into the model, are not feasible under this framework.

Much of the existing literature on data privacy and reconstruction attacks focuses on neural networks, especially in domains like images and text. This paper highlights that training data reconstruction is a more general property of machine learning models. This observation puts the data privacy conversation in a better context – focusing us more on properties of the data and the learning algorithm that determine reconstruction success, rather than just on the model architecture.

Trained Random Forests Completely Reveal your Dataset
ICML 2023 Paper Reviews

Read paper reviews from ICML 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Test-of-time Award: DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell

Downloading a pretrained neural network and slightly modifying it for a new custom task, all done using industry-standard open source tools is a typical workflow for many researchers and engineers today. Ten years ago, the DeCAF was one of the first works to envisage this reality.

The authors showed that extracting image features from an AlexNet neural network pretrained on one task (ImageNet classification) provides a strong foundation for a wide variety of tasks. These features could be used for a new task typically requiring only a small model trained on top of them. A broad range of experiments, where DeCAF substantially improved on the baselines, confirmed the practical relevance of this idea.

The team behind the paper also released the trained models from their experiments for other people to use. The DeCAF team also played a significant role in the development of the Caffe library—one of the first open-source libraries that supported GPU-based neural networks, facilitating their use in both research and industry.

Many of the components that contributed to DeCAF’s success already existed when it was launched:

  • Word2Vec had demonstrated the value of pretraining in NLP and the influence of publicly available models
  • AlexNet showed that deep neural networks can be stronger than hand-crafted features in computer vision

The open-source software ML ecosystem was already rapidly developing but DeCAF combined these trends in a coherent demonstration of much of what the next 10 years of machine learning turned out to be.

Test-of-time Award: DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

ICML 2024: Paper Review #1

Discover the perspectives of Yousuf, one of our machine learning engineers, on the following papers:

  • Arrows of Time for Large Language Models
  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Danny, one of our machine learning engineers, on the following papers:

  • Compute Better Spent: Replacing Dense Layers with Structured Matrices
  • Emergent Equivariance in Deep Ensembles
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Jonathan, one of our software engineers, on the following papers:

  • A Universal Class of Sharpness-Aware Minimization Algorithms
  • Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Michael, one of our Scientific Directors, on the following papers:

  • Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
  • Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Fabian, one of our senior quantitative researchers, on the following papers:

  • I/O Complexity of Attention, or How Optimal is Flash Attention?
  • Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Ingmar, one of our quantitative researchers, on the following papers:

  • Offline Actor-Critic Reinforcement Learning Scales to Large Models
  • Information-Directed Pessimism for Offline Reinforcement Learning
Read now

Latest News

Lessons learned: Delivering software programs (part 5)
  • 04 Oct 2024

Hear more from our Head of Forecasting Engineering and learn how to keep your projects on track by embracing constant change and acting quickly.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

Oxford Coding Challenge

23 Oct 2024 University of Oxford, Computer Science Lecture Theatre A, 7 Parks Rd, Oxford, OX1 3QG
  • Quantitative Engineering
  • Quantitative Research

Cambridge Coding Challenge

28 Oct 2024 East Hub 1, University of Cambridge, JJ Thomson Avenue, Cambridge, CB3 0US
  • Quantitative Engineering
  • Quantitative Research

Edinburgh Coding Challenge

07 Oct 2024 Informatics Forum - G.07/A, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB

Stay up to date with
G-Research