Softmax 函数
Softmax 定义
对向量 $\mathbf{x} = (x_1, \dots, x_n)$,softmax 定义为: $$ y(\mathbf{x}) = \frac{e^{x_i}}{\sum_{k=1}^n e^{x_k}}, \quad i = 1, \dots, n $$ 记: $$ S = \sum_{k=1}^n e^{x_k} $$
Softmax 的求导
我们要求: $$ \frac{\partial y_i}{\partial x_j} $$ 情况 1: $i = j$ ,此时
- $\frac{\partial e^{x_i}}{\partial x_j} = e^{x_i}$
- $\frac{\partial S}{\partial x_j} = e^{x_i}$
$$ \begin{align*} \frac{\partial y_i}{\partial x_i} &= \frac{\partial}{\partial x_i} \left( \frac{e^{x_i}}{S} \right) \ &= \frac{\frac{\partial e^{x_i}}{\partial x_j} S-e^{x_i}\frac{\partial S}{\partial x_j}}{S^2} \ &= \frac{e^{x_i} S - e^{x_i} e^{x_i}}{S^2} \ &= \frac{e^{x_i}}{S} \left( 1 - \frac{e^{x_i}}{S} \right) \ &= \sigma_i (1 - \sigma_i) \end{align*} $$
情况 2: $i \neq j$
- $\frac{\partial e^{x_i}}{\partial x_j} = 0$
- $\frac{\partial S}{\partial x_j} = e^{x_j}$
$$
\begin{align*}
\frac{\partial y_i}{\partial x_j} &= \frac{\partial}{\partial x_j} \left( \frac{e^{x_i}}{S} \right) \ &= \frac{\frac{\partial e^{x_i}}{\partial x_j} S-e^{x_i}\frac{\partial S}{\partial x_j}}{S^2} \ &= -\frac{e^{x_i} e^{x_j}}{S^2} \ &= -y_i y_j \end{align*} $$ 统一形式(雅可比矩阵)
$$
\boxed{\frac{\partial y_i}{\partial x_j} = y_i (\delta_{ij} - y_j)}
$$
其中 $\delta_{ij}$ 是 Kronecker delta。
对应的 Jacobian 矩阵:
$$
J = \text{diag}(y) - y y^T
$$
总结
设
$$ \begin{cases} y = softmax(z) \ L = f(y) \end{cases} $$
其中,$L$ 表示 Loss,$f(.)$ 表示 Loss 函数,$y = \left[ y_1\ \ y_2\ \ y_3 \right]$,$z = \left[ z_1\ \ z_2\ \ z_3 \right]$,若现在我们想求 $\frac{\partial L}{\partial z_j}$,要怎么算呢?
根据链式法则,我们有 $\frac{\partial L}{\partial z_j} = \frac{\partial L}{\partial y} \frac{\partial y}{\partial z_j}$,所以我们分别来看这两项。
其中 $$ \begin{cases} \frac{\partial y_i}{\partial z_j} = y_i(1 - y_i),& \text{当 } i = j \ \frac{\partial y_i}{\partial z_j} = -y_i y_j,& \text{当 } i \neq j \end{cases} $$ 现在就可以来推导 $\frac{\partial L}{\partial z_j}$,我们有:
$$
\frac{\partial L}{\partial z_j} = \frac{\partial L}{\partial y} \frac{\partial y}{\partial z_j} = \sum_{i=1}^l \frac{\partial L}{\partial y_i} \frac{\partial y_i}{\partial z_j} = y_j\left( dy_j - \sum_{j=1}^l y_j dy_j \right)
$$
举个例子,若我们现在想求 $\frac{\partial L}{\partial z_1}$,我们将 $\frac{\partial L}{\partial y} = \left[ m_1\ \ m_2\ \ m_3 \right]$ 代入上面公式,极为 $dy=\left[ dy_1\ \ dy_2\ \ dy_3 \right]=\left[ m_1\ \ m_2\ \ m_3 \right]$ 则有:
$$
\frac{\partial L}{\partial z_1} = m_1\left( y_1 - y_1^2 \right) - m_2 y_1 y_2 - m_3 y_1 y_3
$$
现在,针对所有的 $z$,我们将 $\frac{\partial L}{\partial z}$ 写成矩阵表达式有:
$$
\begin{aligned}
\frac{\partial L}{\partial z} &= \frac{\partial L}{\partial y} \frac{\partial y}{\partial z} = dy\left( diag(y) - y^T y \right) \
&= \left[ m_1\ \ m_2\ \ m_3 \right] \left( \begin{bmatrix} y_1 & 0 & 0 \ 0 & y_2 & 0 \ 0 & 0 & y_3 \end{bmatrix} - \begin{bmatrix} y_1 \ y_2 \ y_3 \end{bmatrix} \left[ y_1\ \ y_2\ \ y_3 \right] \right) \
&= \left[ m_1\ \ m_2\ \ m_3 \right] \begin{bmatrix} y_1 - y_1^2 & -y_1 y_2 & -y_1 y_3 \ -y_2 y_1 & y_2 - y_2^2 & -y_2 y_3 \ -y_3 y_1 & -y_3 y_2 & y_3 - y_3^2 \end{bmatrix}
\end{aligned}
$$
至此,大家记住这两个重要的结论:
$$ \begin{aligned}
\frac{\partial L}{\partial z} &= \frac{\partial L}{\partial y} \frac{\partial y}{\partial z} = dy\left( diag(y) - y^T y \right) \ \frac{\partial L}{\partial z_j} &= y_j\left( dy_j - \sum_{j=1}^l y_j dy_j \right) \end{aligned} $$
-
No backlinks found.