Summary

The paper introduces Layer Normalization, a method for normalization which standardizes each input across all of its features, instead of standardizing each feature across all inputs in a batch (batch normalization, BN). Unlike BN, there is no intra-batch dependency, no need to keep track of means and variances, and no changes in inference (it also works for online inference). This new method is straightforward to apply to RNNs, unlike BN which needs different statistics per time step and layer.