Hello, Friend.

My name is Matus Telgarsky (pic).

I like math; I study deep learning.

Selected work (see also arXiv and google scholar).

(2013.) Coordinate descent methods (and, steepest descent via the same proof) converge to maximum margin solutions.
(2016.) There exist deep networks which are inapproximable by shallow networks unless their size is exponential.
(See also earlier, simpler construction and a more didactic version in my lecture notes.)
(2018.) Classical margin-based generalization theory also works for deep networks.
(2019.) Large margin analysis of gradient descent even for non-separable data.
(See also arxiv v2, which contains a succinct SGD proof with 1/t rate.)
(2020.) Preceding “v2”, but now for shallow ReLU networks near initialization.
(2020.) Gradient descent converges “in direction” for deep ReLU networks and friends.
(2024.) (Decoder-)Transformer layers are parallel computation rounds.
(2024.) Logistic regression doesn’t care about the step size.

I am sort of writing a deep learning theory textbook (new version, old version).

Courant (2023-).

UIUC (2016-2023).

Deep learning theory (CS 540 ~~CS 598 DLT~~): fall 2022, fall 2021, fall 2020, fall 2019.
Deep learning theory lecture notes: new version, old version.
Machine learning (CS 446): spring 2022, spring 2021, spring 2019, spring 2018.
Some course materials.
Machine learning theory (CS 598 TEL): fall 2018, fall 2017, fall 2016.