MLTea: Data representation in deep random neural networks

Speaker

Hadi Daneshmand

LIDS

Host

Behrooz Tahmasebi

Abstract: Depth plays an essential role in the performance of neural networks. Increasing the depth boosts the performance of neural networks in various applications. Depth influences expressive power, optimization, and generalization of neural networks. This talk focuses on the dynamic of representations at initialization. The representations across the layers construct a Markov chain whose property significantly influences optimization with gradient descent. The neural architecture determines the properties of this Markov chain.

We demonstrate that batch normalization layers, important building blocks of neural networks, bias the hidden representation to an (almost) orthogonal matrix through depth when activations are linear [1]. For networks with non-linear activations, we show that mean-field approximations accurately predicate the spectra of representations. Since mean-field approximation suffers from error in each layer, it is postulated that their prediction error propagates with depth. But, we prove that the error does not propagate with depth when the chain of representations is geometric ergodic [2]. This allows us to establish non-asymptotic concentration bounds for mean-field predictions. Inspired by this result, we motivate bridging the gap between mean-field and finite-width neural networks.

References.
[1] Hadi Daneshmand, Amir Joudaki, Francis Bach. Batch Normalization Orthogonalizes Representations in Deep Random Networks. NeurIps 2021 (spotlight)
[2] Amir Joudaki, Hadi Daneshmand, Francis Bach. On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization. Arxiv 2022.

Add to Calendar 2023-03-06 16:00:00 2023-03-06 16:30:00 America/New_York MLTea: Data representation in deep random neural networks Abstract: Depth plays an essential role in the performance of neural networks. Increasing the depth boosts the performance of neural networks in various applications. Depth influences expressive power, optimization, and generalization of neural networks. This talk focuses on the dynamic of representations at initialization. The representations across the layers construct a Markov chain whose property significantly influences optimization with gradient descent. The neural architecture determines the properties of this Markov chain. We demonstrate that batch normalization layers, important building blocks of neural networks, bias the hidden representation to an (almost) orthogonal matrix through depth when activations are linear [1]. For networks with non-linear activations, we show that mean-field approximations accurately predicate the spectra of representations. Since mean-field approximation suffers from error in each layer, it is postulated that their prediction error propagates with depth. But, we prove that the error does not propagate with depth when the chain of representations is geometric ergodic [2]. This allows us to establish non-asymptotic concentration bounds for mean-field predictions. Inspired by this result, we motivate bridging the gap between mean-field and finite-width neural networks. References. [1] Hadi Daneshmand, Amir Joudaki, Francis Bach. Batch Normalization Orthogonalizes Representations in Deep Random Networks. NeurIps 2021 (spotlight) [2] Amir Joudaki, Hadi Daneshmand, Francis Bach. On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization. Arxiv 2022. 32-G449 (Kiva) and via Zoom

Organizer & Contact

Behrooz Tahmasebi

bzt@csail.mit.edu

MLTea: Data representation in deep random neural networks

Speaker

Host

March 06 2023

Location

Organizer & Contact