CILVR Seminar: Entropy and Private Language Models

Speaker: Nandan Kumar Jha

Location: 60 Fifth Avenue, Room 7th floor open space
Videoconference link: https://nyu.zoom.us/s/94885227670

Date: Wednesday, April 9, 2025

Running large language models on sensitive data demands private inference (PI), where computation is performed on encrypted inputs without exposing their content. However, nonlinear operations like Softmax, GELU, and LayerNorm inflate memory and latency under cryptographic constraints, making PI impractical. To address this, we introduce an information-theoretic framework that employs Shannon’s entropy to characterize the role of nonlinearities in decoder-only language models. We show that beyond ensuring training stability, these nonlinearities preserve attention head diversity. Their absence causes two key failures: (1) entropy collapse in deeper layers, destabilizing training, and (2) entropic overload in earlier layers, underutilizing multi-head attention. We overcome these issues with an entropy-guided attention mechanism paired with an entropy regularization technique to mitigate entropic overload, and PI-friendly parametric normalization techniques, such as weight and spectral normalization, to prevent entropy collapse.