MLTea: Data representation in deep random neural networks
Speaker
Hadi Daneshmand
LIDS
Host
Behrooz Tahmasebi
Abstract: Depth plays an essential role in the performance of neural networks. Increasing the depth boosts the performance of neural networks in various applications. Depth influences expressive power, optimization, and generalization of neural networks. This talk focuses on the dynamic of representations at initialization. The representations across the layers construct a Markov chain whose property significantly influences optimization with gradient descent. The neural architecture determines the properties of this Markov chain.
We demonstrate that batch normalization layers, important building blocks of neural networks, bias the hidden representation to an (almost) orthogonal matrix through depth when activations are linear [1]. For networks with non-linear activations, we show that mean-field approximations accurately predicate the spectra of representations. Since mean-field approximation suffers from error in each layer, it is postulated that their prediction error propagates with depth. But, we prove that the error does not propagate with depth when the chain of representations is geometric ergodic [2]. This allows us to establish non-asymptotic concentration bounds for mean-field predictions. Inspired by this result, we motivate bridging the gap between mean-field and finite-width neural networks.
References.
[1] Hadi Daneshmand, Amir Joudaki, Francis Bach. Batch Normalization Orthogonalizes Representations in Deep Random Networks. NeurIps 2021 (spotlight)
[2] Amir Joudaki, Hadi Daneshmand, Francis Bach. On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization. Arxiv 2022.
We demonstrate that batch normalization layers, important building blocks of neural networks, bias the hidden representation to an (almost) orthogonal matrix through depth when activations are linear [1]. For networks with non-linear activations, we show that mean-field approximations accurately predicate the spectra of representations. Since mean-field approximation suffers from error in each layer, it is postulated that their prediction error propagates with depth. But, we prove that the error does not propagate with depth when the chain of representations is geometric ergodic [2]. This allows us to establish non-asymptotic concentration bounds for mean-field predictions. Inspired by this result, we motivate bridging the gap between mean-field and finite-width neural networks.
References.
[1] Hadi Daneshmand, Amir Joudaki, Francis Bach. Batch Normalization Orthogonalizes Representations in Deep Random Networks. NeurIps 2021 (spotlight)
[2] Amir Joudaki, Hadi Daneshmand, Francis Bach. On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization. Arxiv 2022.