DL Fundamentals Lab
Texas A&M University
Deep Learning Fundamentals Lab

Tomer
Galanti

Assistant Professor · Computer Science & Engineering
Texas A&M University · galanti@tamu.edu · Office: 325 PETR

Tomer Galanti is an Assistant Professor in the Department of Computer Science and Engineering at Texas A&M University. Prior to joining Texas A&M, he was a postdoctoral associate at MIT's Center for Brains, Minds & Machines, working with Tomaso Poggio. He received his Ph.D. from Tel Aviv University, advised by Lior Wolf, and interned as a Research Scientist at Google DeepMind in 2021.

Tomer Galanti
Research Focus

Foundations of reusable intelligence

Our research develops foundations for reusable intelligence: AI systems that discover latent structure and turn it into reliable computation. We study when learning recovers information that can be reused beyond the training problem: transferred across tasks, amplified through reasoning, verified for correctness, or compiled into efficient procedures. Our work advances this agenda through two technical programs: geometric laws of representation reuse in supervised pretraining and neural collapse, and in self-supervised pretraining and directional collapse; and distribution-aware programming, where samples from a task distribution are used to synthesize efficient, specialized algorithms.

Selected Research Contributions
All publications →
01
LLM-PV · propose-and-verify program learning
A central question in program learning is whether a model has recovered the underlying rule or merely interpolated the observed examples. Classical theory shows that short programs can be identified from few samples, but the required search is computationally expensive; gradient-based training avoids explicit search, but can require dramatically more data. LLM-PV addresses this gap by using a pretrained LLM as a structured program prior and independent verification as the final selection criterion — without gradient updates or exhaustive enumeration (ICML 2025). From roughly 200 examples, the method recovers exact rules for parity, palindromes, Dyck-2, and primality, and generalizes well beyond the training lengths, while SGD-trained transformers fail to generalize reliably even with 100k examples.
ICML 2025 LLMs Program Learning Reasoning
02
Distribution-Aware Programming · learning solvers from a task distribution
Many computational problems arise not as isolated instances, but as repeated draws from a stable deployment distribution. Standard algorithms are typically designed for worst-case correctness over broad input classes, and therefore do not exploit the recurring structure present in a particular operational setting. Distribution-aware programming formalizes an alternative: examples from the deployment distribution are used to infer reusable structure and compile it into specialized solver code (preprint). Across 21 structured optimization distributions, the synthesized solvers achieve 0.971 mean normalized quality while running 564.9× faster than strong heuristics, 345.1× faster than Gurobi, and 16.9× faster than exact backends.
Preprint 2026 Distribution-Aware Programming Algorithm Design LLM Agents
03
Agentic Boosting · weak reasoners, boosted to frontier performance
Weak language-model agents often generate correct solutions among their samples, but lack a reliable mechanism for selecting them. This suggests that the main obstacle is not always generation quality, but verification and selection. Agentic boosting turns weak model calls into stronger systems by aggregating candidates only when selection is supported by an independent soundness signal, such as execution, tests, proofs, or type checks (preprint). On SWE-bench Verified, GPT-5.4 nano improves from 67.0% to 76.4% at k=8, matching standalone Gemini 3 Pro and Claude Opus 4.5 Thinking, and approaching the 79.0% oracle ceiling.
Preprint 2026 LLM Agents Reasoning Boosting
04
CDNV · a geometric predictor of transfer
Pretrained representations can support accurate few-shot transfer to novel classes, but the geometric conditions enabling this behavior were not well understood. We identified Class-Distance Normalized Variance (CDNV) as a central quantity governing transfer: it measures within-class concentration relative to between-class separation and yields an upper bound on few-shot transfer error (ICLR 2022, JMLR 2026). This provides a predictive theory connecting neural-collapse geometry to the transfer behavior of learned representations.
ICLR 2022 JMLR 2026 Neural Collapse Transfer Learning CDNV
05
NSCL · contrastive learning is approximately supervised learning
Self-supervised contrastive learning often recovers semantic representations comparable to those learned with labels, but the source of this supervision-like behavior remained theoretically unclear. We showed that contrastive learning is closely related to a supervised contrastive objective with only negative labels, formalized as the negatives-only supervised contrastive loss (NSCL). The gap between the two objectives vanishes as the number of classes grows, under a bound that is label-agnostic and architecture-independent (NeurIPS 2025, ICLR 2026).
NeurIPS 2025 ICLR 2026 Self-Supervised Learning Representation Theory
06
Directional Neural Collapse · collapse only where it matters
The geometry of supervised neural collapse does not directly explain self-supervised representations, whose features retain substantial global variance while still enabling strong few-shot transfer. We showed that the relevant phenomenon is directional: features concentrate along the class-separating directions, even when they do not collapse globally. The resulting quantity, directional CDNV, predicts transfer performance and is reduced by contrastive training by roughly an order of magnitude, while global variance remains largely intact (NeurIPS 2025, ICML 2026).
NeurIPS 2025 ICML 2026 Self-Supervised Learning Neural Collapse
07
Implicit Rank Minimization · SGD's hidden low-rank bias
Low-rank structure is pervasive in modern neural networks and underlies practical techniques such as compression and LoRA-style adaptation, yet its origins are often treated as empirical. We proved that mini-batch SGD with weight decay induces an implicit bias toward low-rank weights across a broad class of modern architectures. The effect becomes stronger with smaller batch sizes, larger learning rates, and stronger weight decay, providing a mechanism for the emergence of low-rank structure during training (CPAL 2025).
CPAL 2025 Implicit Regularization Generalization
Teaching
Special Topics in Recent Developments in Deep Learning & LLMs Texas A&M University · Fall 2025
Introduction to Machine Learning Texas A&M University · Spring 2025
Special Topics in Recent Developments in Deep Learning & LLMs Texas A&M University · Fall 2024
Statistical Learning Theory and its Applications Massachusetts Institute of Technology · Fall 2023
Statistical Learning Theory and its Applications Massachusetts Institute of Technology · Fall 2022
Deep Convolutional Neural Networks Tel Aviv University · Spring 2020
Deep Convolutional Neural Networks Tel Aviv University · Spring 2019