DL Fundamentals Lab
Texas A&M University
Deep Learning Fundamentals Lab

Tomer
Galanti

Assistant Professor · Computer Science & Engineering
Texas A&M University · galanti@tamu.edu · Office: 325 PETR
Pier Beneventano
Research Focus

Building a rigorous science of modern AI

The Deep Learning Fundamentals group studies the principles that make modern AI systems work. Our research focuses on representation learning, self-supervised learning, and the foundations of reasoning in large language models. We combine mathematical analysis with empirical study to understand what structure modern models learn, how that structure is shaped by training, and why it leads to strong generalization and adaptation.

A central goal of the lab is to turn phenomena that are often treated as mysterious, such as neural collapse, label-efficient transfer, implicit low-rank bias, and semantic structure in self-supervised learning, into precise and predictive theory. We also study how pretrained language models can be used as components in principled, verifiable systems for searching over hypotheses, programs, and solution strategies.

Tomer Galanti is an Assistant Professor in the Department of Computer Science and Engineering at Texas A&M University. Prior to joining Texas A&M, he was a postdoctoral associate at MIT's Center for Brains, Minds & Machines, working with Tomaso Poggio. He received his Ph.D. from Tel Aviv University, advised by Lior Wolf, and interned as a Research Scientist at Google DeepMind in 2021.

Selected Research Contributions
All publications →
01
Neural Collapse Explains Why Transfer Works
Why do pre-trained models often transfer so well from only a few examples? In this work, this follow-up, and later extensions, we show that the geometry learned during pretraining can extend beyond the source classes and directly support few-shot generalization on new ones. A central quantity in this theory is Class-Distance Normalized Variance (CDNV), which we introduce as a predictor of transfer error and use to connect neural collapse to a broader theory of representation transfer.
ICLR 2022 ICML WS 2022 Neural Collapse Transfer Learning Few-Shot Learning
02
Self-Supervised Contrastive Learning Is Closer to Supervised Learning Than It Seems
In this paper and this follow-up, we show that standard contrastive learning closely tracks a supervised surrogate, both at the level of the loss and throughout training in representation space. This provides a principled explanation for why contrastive pretraining can recover semantic structure so effectively without labels, and helps narrow the conceptual gap between self-supervised and supervised learning.
NeurIPS 2025 ICLR 2026 Self-Supervised Learning Representation Theory
03
Directional Neural Collapse in Self-Supervised Learning
Self-supervised learning succeeds in the few-shot regime not because features collapse everywhere, but because they collapse in the directions that matter for decision-making. In this paper and this follow-up, we identify directional CDNV, the variability along class-separating directions, as the key quantity behind strong transfer. The same view also explains task orthogonalization, where distinct tasks occupy nearly orthogonal decision axes and therefore interfere little with one another.
NeurIPS 2025 Preprint 2026 Self-Supervised Learning Neural Collapse Transfer Learning
04
SGD and Weight Decay Implicitly Minimize Rank
In this paper, we show that mini-batch SGD together with weight decay induces an implicit bias toward low-rank weight matrices across a broad class of modern architectures. The theory reveals how this effect strengthens with smaller batch sizes, larger learning rates, and stronger weight decay, suggesting that compressibility is not merely a byproduct of training but something optimization is actively shaping.
CPAL 2025 Implicit Regularization Generalization
05
Hypernetworks Are Fundamentally More Modular
Hypernetworks condition a model by generating its weights rather than merely feeding context into a fixed predictor. In this paper, we give a theoretical explanation for why this can be fundamentally more powerful, formalizing modularity as the ability to efficiently realize a different function for each conditioning input. Under structured assumptions, we show that hypernetworks can achieve this with dramatically fewer parameters than standard and embedding-based alternatives.
NeurIPS 2020 Oral Hypernetworks Expressivity
06
LLMs Make Program Learning Computationally Feasible
In this paper, we show that pretrained LLMs can make ERM over programs practical through a simple propose-and-verify pipeline: candidate programs are proposed, executed, and selected by verification on data. This bridges a central gap in program learning, where classical ERM is statistically attractive but computationally intractable, while gradient-based methods are computationally convenient but may miss simple underlying rules. Across tasks such as parity, pattern matching, and primality, the method often recovers the exact rule from small labeled sets and generalizes far beyond the training regime.
Preprint LLMs Program Learning Reasoning
Teaching
Special Topics in Recent Developments in Deep Learning & LLMs Texas A&M University · Fall 2025
Introduction to Machine Learning Texas A&M University · Spring 2025
Special Topics in Recent Developments in Deep Learning & LLMs Texas A&M University · Fall 2024
Statistical Learning Theory and its Applications Massachusetts Institute of Technology · Fall 2023
Statistical Learning Theory and its Applications Massachusetts Institute of Technology · Fall 2022
Deep Convolutional Neural Networks Tel Aviv University · Spring 2020
Deep Convolutional Neural Networks Tel Aviv University · Spring 2019