CILVR Seminar : Deep learning from an information theory perspective

Speaker: Ravid Shwartz-Ziv

Location: 60 Fifth Avenue, Room 7th floor

Date: Wednesday, November 17, 2021

 

While DNNs have achieved many breakthroughs, our understanding of their internal structure, optimization process, and generalization is poor, and we often treat them as black boxes. We attempt to resolve these issues by suggesting that DNNs learn to optimize the Information Bottleneck (IB) principle - the tradeoff between information compression and prediction quality.
In the first part of the talk, there will be a presentation of this approach, showing an analytical and numerical study of DNNs in the information plane. This analysis reveals how the training process compresses the input to an optimal, efficient representation. There will be a discussion on recent works inspired by this analysis and show how we can apply them to real-world problems. In the second part of the talk,there will be a discussion on information in infinitely-wide neural networks using recent results in Neural Tangent Kernels (NTK) networks. The NTK allows us to derive many tractable information-theoretic quantities. By utilizing these derivations, we can do an empirical search to find the important information-theoretic quantities that affect generalization in DNNs. 
If time permits, there will  be a presentation on  more recent work, the Dual Information Bottleneck (dualIB) framework, to find an optimal representation that resolves some of the drawbacks of the original IB. A theoretical analysis of the dualIB shows the structure of its solution and its ability to preserve the original distribution's statistics. Within this, there will be a focus on the variational form of the dualIB, allowing its application to DNNs.