Infinitely Wide Nets
Many problems in theoretical understanding of neural nets come from the fact that it is hard to reason about their training dynamics. In particular, one cannot generally guarantee global convergence of gradient descent --- a fact typically observed for realistic networks and data. Moreover, all generalization bounds that do not take the training dynamics into account turn out to be vacuous.Fortunately, the training dynamics of neural nets substantially simplifies in the limit of infinite width. One of the limit, the NTK limit, is driven by a constant kernel which can be estimated via Monte-Carlo. We shall discuss how this limit can be used to obtain optimization and generalization guarantees for sufficiently wide networks. Another limit, the mean-field limit, leads to a quantitatively different limit model.The reason why we have two different limits is the difference in hyperparameter scaling with width. We shall show how different hyperparameter scalings result in different limit models, and discuss which limit model should be a better proxy for realistic finite-width nets.
Feel free to join!