David Holzmüller, Leo Grinsztajn, Ingo Steinwart
Over the years there have been endless papers benchmarking neural networks and boosted trees, usually concluding that the trees are better for tabular data. This paper revisits this comparison and proposes a bag of tricks for improving simple neural network models.
The tricks include preprocessing methods, new learning rate schedules, additional scaling layers and data-driven weight initialisations. Nothing in this paper is especially groundbreaking, but that is the point: getting all these details right is important to getting good performance.
Another contribution of the paper is to propose a larger benchmark of tabular data, which is larger scale than previous attempts. The paper also compares hyper parameter tuning vs ensembling and all the time is making comparisons based for differing computational budgets.
This is a very practical paper that suggests many sensible ideas that practitioners will find useful.
Better by default: Strong pre-tuned MLPs and boosted trees on tabular data