torch.auotograd模块可实现任意标量值函数自动求导的类和函数。针对一个张量只需要设置参数requires_grad=True,通过相关计算即可输出其在传播过程中的梯度(导数)信息。
在PyTorch中生成一个矩阵张量x,且 y = s u m ( x 2 + 2 x + 1 ) y = sum(x^2+2x+1) y=sum(x2+2x+1),计算出y在x上的导数,程序如下:
import torch x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True) y = torch.sum(x ** 2 + 2 * x + 1) y.backward() print(x.grad) >>>tensor([[ 4., 6.], [ 8., 10.]])首先,生成 [ 2 × 2 ] [2\times 2] [2×2]维度的张量 x x x,并计算 y = s u m ( x 2 + 2 x + 1 ) y = sum(x^2+2x+1) y=sum(x2+2x+1),计算出的 y y y是标量(scalar value)。 y = ( x 11 2 + 2 x 11 + 1 ) + ( x 13 2 + 2 x 13 + 1 ) + ( x 31 2 + 2 x 31 + 1 ) + ( x 2 2 + 2 x 22 + 1 ) y=(x_{11}^2+2x_{11}+1)+(x_{13}^2+2x_{13}+1)+(x_{31}^2+2x_{31}+1)+(x_{2}^2+2x_{22}+1) y=(x112+2x11+1)+(x132+2x13+1)+(x312+2x31+1)+(x22+2x22+1)
此时使用y.backward()即可自动计算出 y y y在 x x x每个元素上的导数。即: [ ∂ y ∂ x 11 ∂ y ∂ x 12 ∂ y ∂ x 21 ∂ y ∂ x 22 ] = [ 2 x 11 + 2 2 x 12 + 2 2 x 21 + 2 2 x 22 + 2 ] \left[ \begin{matrix} \frac{\partial y}{\partial x_{11}} &\frac{\partial y}{\partial x_{12}}\\ \frac{\partial y}{\partial x_{21}} &\frac{\partial y}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right] [∂x11∂y∂x21∂y∂x12∂y∂x22∂y]=[2x11+22x21+22x12+22x22+2] 通过计算即可得出如下结果:
[[ 4., 6.], [ 8., 10.]]为什么要使用sum()呢?可不可以去掉sum()呢?像这样: y = x 2 + 2 x + 1 y = x^2+2x+1 y=x2+2x+1
尝试一下:
import torch x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True) y = x ** 2 + 2 * x + 1 y.backward() print(x.grad)此时会报错,其含义大概就是.backward()只能对标量使用。 经过查阅相关内容,对代码进行改进:
import torch x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True) y = x ** 2 + 2 * x + 1 y.backward(gradient=torch.tensor([[1.0, 1], [1, 1]])) print(x.grad) >>>tensor([[ 4., 6.], [ 8., 10.]])在y.backward()内添加一个 [ 2 × 2 ] [2\times 2] [2×2]的单位向量,此时代码可成功运行。可以这么理解: [ y 11 y 12 y 21 y 22 ] = [ ( x 11 2 + 2 x 11 + 1 ) ( x 12 2 + 2 x 12 + 1 ) ( x 21 2 + 2 x 21 + 1 ) ( x 22 2 + 2 x 22 + 1 ) ] \left[ \begin{matrix} y_{11} &y_{12}\\ y_{21}&y_{22}\\ \end{matrix} \right]= \left[ \begin{matrix} (x_{11}^2+2x_{11}+1)&(x_{12}^2+2x_{12}+1)\\ (x_{21}^2+2x_{21}+1)&(x_{22}^2+2x_{22}+1)\\ \end{matrix} \right] [y11y21y12y22]=[(x112+2x11+1)(x212+2x21+1)(x122+2x12+1)(x222+2x22+1)] y已经不再是一个标量,而是一个矩阵,此时的求导是对应元素分别求导。 [ ∂ y 11 ∂ x 11 ∂ y 12 ∂ x 12 ∂ y 21 ∂ x 21 ∂ y 22 ∂ x 22 ] = [ 2 x 11 + 2 2 x 12 + 2 2 x 21 + 2 2 x 22 + 2 ] \left[ \begin{matrix} \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y_{12}}{\partial x_{12}}\\ \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y_{22}}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right] [∂x11∂y11∂x21∂y21∂x12∂y12∂x22∂y22]=[2x11+22x21+22x12+22x22+2] 如果将单位矩阵乘2:y.backward(gradient=torch.tensor([[2.0, 2], [2, 2]]))
import torch x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True) y = x ** 2 + 2 * x + 1 y.backward(gradient=torch.tensor([[2.0, 2], [2, 2]])) print(x.grad) >>>tensor([[ 8., 12.], [16., 20.]])相当于: [ y 11 y 12 y 21 y 22 ] = [ 2 ( x 11 2 + 2 x 11 + 1 ) 2 ( x 12 2 + 2 x 12 + 1 ) 2 ( x 21 2 + 2 x 21 + 1 ) 2 ( x 22 2 + 2 x 22 + 1 ) ] \left[ \begin{matrix} y_{11} &y_{12}\\ y_{21}&y_{22}\\ \end{matrix} \right]= \left[ \begin{matrix} 2 (x_{11}^2+2x_{11}+1)&2(x_{12}^2+2x_{12}+1)\\ 2(x_{21}^2+2x_{21}+1)&2(x_{22}^2+2x_{22}+1)\\ \end{matrix} \right] [y11y21y12y22]=[2(x112+2x11+1)2(x212+2x21+1)2(x122+2x12+1)2(x222+2x22+1)] 所以其导数也将是原值的二倍。 [ ∂ y 11 ∂ x 11 ∂ y 12 ∂ x 12 ∂ y 21 ∂ x 21 ∂ y 22 ∂ x 22 ] = [ 4 x 11 + 4 4 x 12 + 4 4 x 21 + 4 4 x 22 + 4 ] \left[ \begin{matrix} \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y_{12}}{\partial x_{12}}\\ \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y_{22}}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 4x_{11} + 4&4x_{12} + 4\\ 4x_{21} + 4&4x_{22} + 4\\ \end{matrix} \right] [∂x11∂y11∂x21∂y21∂x12∂y12∂x22∂y22]=[4x11+44x21+44x12+44x22+4]
想象力丰富一点,将前面两个案例相结合。
import torch x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True) y = x ** 2 + 2 * x + 1 z = torch.sum(x ** 2 + 2 * x + 1) y.backward(gradient=torch.tensor([[1.0, 1], [1, 1]])) z.backward() print(x.grad) >>>tensor([[ 8., 12.], [16., 20.]])以此对y、z进行反向传播求导,其输出结果是两次求导的和。求导公式应为: [ ∂ y ∂ x 11 + ∂ y 11 ∂ x 11 ∂ y ∂ x 12 + ∂ y 12 ∂ x 12 ∂ y ∂ x 21 + ∂ y 21 ∂ x 21 ∂ y ∂ x 22 + ∂ y 22 ∂ x 22 ] = [ 2 x 11 + 2 2 x 12 + 2 2 x 21 + 2 2 x 22 + 2 ] + [ 2 x 11 + 2 2 x 12 + 2 2 x 21 + 2 2 x 22 + 2 ] \left[ \begin{matrix} \frac{\partial y}{\partial x_{11}} + \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y}{\partial x_{12}} + \frac{\partial y_{12}}{\partial x_{12}}\\ \frac{\partial y}{\partial x_{21}} + \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y}{\partial x_{22}} + \frac{\partial y_{22}}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right]+ \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right] [∂x11∂y+∂x11∂y11∂x21∂y+∂x21∂y21∂x12∂y+∂x12∂y12∂x22∂y+∂x22∂y22]=[2x11+22x21+22x12+22x22+2]+[2x11+22x21+22x12+22x22+2]