Physics of AI

We propose an approach to the science of deep learning that roughly follows what physicists do to understand reality: (1) explore phenomena through controlled experiments, and (2) build theories based on toy mathematical models and non-fully- rigorous mathematical reasoning. I illustrate (1) with the LEGO study (LEGO stands for Learning Equality and Group Operations), where we observe how transformers learn to solve simple linear systems of equations. I will also briefly illustrate (2) with an analysis of the emergence of threshold units when training a two-layers neural network to solve a simple sparse coding problem. The latter analysis connects to the recently discovered Edge of Stability phenomenon.

Date:
Speakers:
Sébastien Bubeck
Affiliation:
Microsoft Research