跳到主要内容

向量微积分基础

· 阅读需 4 分钟

机器学习里经常需要用到向量微积分。向量微积分其实并不难,但大学数学一般不提,导致在看机器学习的一些推导时常常感觉疑惑。

机器学习里经常用到标量和向量、向量和向量的求导,其实只是把向量对应位置的元素进行求导。但是,这些元素的组织方式有两种,分别是分子布局和分母布局,二者并无本质上的差别,只是结果相差个转置。这两种布局都存在,初学者常常混淆。

例如求yx\frac {\partial \mathbf{y}} {\partial x},其中y\mathbf{y}nn维列向量,xx是标量。这个求导就是把y\mathbf{y}里每个元素分别对xx求导,但求导后是得到列向量还是行向量呢?

对于分子布局:

yx=[y1xy2xynx]\frac {\partial \mathbf{y}} {\partial x} = \begin{bmatrix} \frac {\partial y_1} {\partial x} \\ \frac {\partial y_2} {\partial x} \\ \vdots \\ \frac {\partial y_n} {\partial x} \\ \end{bmatrix}

对于分母布局:

yx=[y1xy2xynx]\frac {\partial \mathbf{y}} {\partial x} = \begin{bmatrix} \frac {\partial y_1} {\partial x} & \frac {\partial y_2} {\partial x} & \dots & \frac {\partial y_n} {\partial x} \\ \end{bmatrix}

两种布局容易混淆,建议选择自己习惯的布局即可。这里我们选择分子布局进行后面的说明。

符号约定:小写粗体:值为向量;大写粗体:值为矩阵;小写斜体:值为标量。以a、b、c、d表示和x无关的函数,u=u(x),v=v(x),f、g、h是函数。

yx=[yx1yx2yxn]\frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \dots & \frac{\partial y}{\partial x_n} \\ \end{bmatrix} yx=[y1x1y1x2y1xny2x1y2x2y2xnymx1ymx2ymxn]\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}

这个矩阵又叫雅可比(Jacobi)矩阵

yX=[yx11yx21yxp1yx12yx22yxp2yx1qyx2qyxpq]\frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}

虽然看着挺复杂,但不难看出:分子布局的特点是,分子的编号排列和分子相同,分母的编号排列和分母的转置相同。

一些求导公式比较常用,在此列举一下:

Axx=A\frac {\partial {\mathbf{Ax}}} {\partial \mathbf{x}} = \mathbf{A} xXx=A\frac {\partial \mathbf{x}^\top \mathbf{X}} {\partial \mathbf{x}} = \mathbf{A}^\top xxx=2x\frac {\partial \mathbf{x}^\top \mathbf{x}} {\partial \mathbf{x}} = 2 \mathbf{x}^\top xAxx=x(A+A)\frac {\partial \mathbf{x}^\top \mathbf{A} \mathbf{x}} {\partial \mathbf{x}} = \mathbf{x}^\top(\mathbf{A} + \mathbf{A}^\top)

A\mathbf{A}为对称阵,则对于上式:

xAxx=x(A+A)=2xA\begin{split} \frac {\partial \mathbf{x}^\top \mathbf{A} \mathbf{x}} {\partial \mathbf{x}} &= \mathbf{x}^\top(\mathbf{A} + \mathbf{A}^\top) \\ &= 2 \mathbf{x}^\top\mathbf{A} \end{split}

和、积的导数:

(u+v)x=ux+vx\frac {\partial (\mathbf{u} + \mathbf{v})} {\partial \mathbf{x}} = \frac {\partial \mathbf{u}} {\partial \mathbf{x}} + \frac {\partial \mathbf{v}} {\partial \mathbf{x}} (uv)x=uvx=uvx+vux{\frac {\partial ({\mathbf {u}}\cdot {\mathbf {v}})}{\partial {\mathbf {x}}}}={\frac {\partial {\mathbf {u}}^{\top }{\mathbf {v}}}{\partial {\mathbf {x}}}}= {\mathbf {u}}^{\top }{\frac {\partial {\mathbf {v}}}{\partial {\mathbf {x}}}}+{\mathbf {v}}^{\top }{\frac {\partial {\mathbf {u}}}{\partial {\mathbf {x}}}}

链式求导:

f(u)x=f(u)uux\frac{\partial \mathbf{f(u)}}{\partial \mathbf{x}} = \frac{\partial \mathbf{f(u)}}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}

更多详细内容可以参考:Matrix calculus - Wikipedia