torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)
相应运算的数学表示为: y = x − E [ x ] V a r [ x ] + ϵ ∗ γ + β y=frac{x-E[x]}{sqrt{Var[x]+epsilon}}*gamma+beta y=Var[x]+ϵ x−E[x]∗γ+β
其中 E [ x ] E[x] E[x]表示expectation, V a r [ x ] Var[x] Var[x]表示variance, β , γ beta,gamma β,γ是可学习参数, ϵ > 0 epsilon>0 ϵ>0是一个任意小的数字。
N, C, H, W = 12, 3, 256, 256 input = torch.randn(N, C, H, W) # input data # Normalize over the last three dimensions (i.e、the channel and spatial dimensions) # as shown in the image below layer_norm = nn.LayerNorm([C, H, W]) output = layer_norm(input)