My name is Matus Telgarsky (pic).
I like math; I study deep learning.
(2016.) There exist deep networks which
are inapproximable by shallow networks unless their size is
exponential.
(See also earlier, simpler
construction and a
more didactic version in my lecture notes.)
(2018.) Classical margin-based generalization theory also works for deep networks.
(2019.) Large margin analysis of
gradient descent even for non-separable data.
(See also arxiv v2, which
contains a succinct SGD proof with 1/t rate.)
(2020.) Preceding “v2”, but now for shallow ReLU networks near initialization.
(2020.) Gradient descent converges “in direction” for deep ReLU networks and friends.
(2024.) (Decoder-)Transformer layers are parallel computation rounds.
(2024.) Logistic regression doesn’t care about the step size.
I am sort of writing a deep learning theory textbook (new version, old version).
Courant (2023-).
UIUC (2016-2023).
Deep learning theory (CS 540 CS 598 DLT): fall 2022, fall
2021, fall 2020, fall 2019.
Deep learning theory lecture notes: new version, old
version.
Machine learning (CS 446): spring 2022, spring 2021, spring 2019,
spring 2018.
Some course materials.
Machine learning theory (CS 598 TEL): fall 2018, fall 2017, fall 2016.
Ziwei Ji (吉梓玮), Fanny Yang, Po-Ling Loh, Daniel Hsu, Sanjoy Dasgupta, Jeroen Rombouts, Maxim Raginsky, Lana Lazebnik, …
Midwest ML Symposium (with
Po-Ling Loh!!),
Simons Institute
Program on Deep Learning,
Simons
Institute Program on Generalization (with Po-Ling Loh!!!!).
Violin (violin! (violin!!)), desks (desks! (desks!! (desks!!!))), crackberg, scifi books, pencils.