This section intends to detail the equations ruling a fully connected (dense)
layer (see bgd.layers.fc.FullyConnected
).
\[\theta^{(k)} = (A^{(k)}, b^{(k)}) \in \mathbb R^{n^{(k-1)} \times n^{(k)}} \times \mathbb R^{n^{(k)}},\]
\[\Lambda_{(A^{(k)}, b^{(k)})}^{(k)} : \mathbb R^{n^{(k-1)}} \to \mathbb R^{n^{(k)}} : x \mapsto xA^{(k)} + b^{(k)}.\]
Backpropagation
The backpropagation algorithm requires \(\partial_{A^{(k)}_{i,j}}\mathcal L\),
\(\partial_{b^{(k)}_i}\mathcal L\) and \(\partial_{X^{(k-1)}_{i,j}}\mathcal L\)
For the weights update:
\[\begin{split}\partial_{A^{(k)}_{i,j}}\mathcal L(X) &= \sum_{\alpha,\beta = (1,1)}^{(\ell, n^{(k)})}\varepsilon^{(k)}_{\alpha,\beta}\partial_{A^{(k)}_{i,j}}X^{(k)}_{\alpha,\beta}
= \sum_{\alpha,\beta = (1,1)}^{(\ell, n^{(k)})}\varepsilon^{(k)}_{\alpha,\beta}\sum_{\gamma=1}^{n^{(k-1)}}X_{\alpha,\gamma}\partial_{A^{(k)}_{i,j}}A^{(k)}_{\gamma,\beta} \\
&= \sum_{\alpha,\beta = (1,1)}^{(\ell, n^{(k)})}\varepsilon^{(k)}_{\alpha,\beta}X_{\alpha,i}\delta_\beta^j
= \sum_{\alpha=1}^\ell\varepsilon^{(k)}_{\alpha,j}X_{\alpha,i} = \Big(X'\varepsilon^{(k)}\Big)_{i,j}.\end{split}\]
Therefore \(\nabla_{A^{(k)}}\mathcal L = X'\varepsilon\).
\[\partial_{b^{(k)}_i}\mathcal L = \sum_{\alpha,\beta = (1,1)}^{(\ell, n^{(k)})}\varepsilon^{(k)}_{\alpha,\beta}\partial_{b^{(k)}_i}b^{(k)}_\beta
= \sum_{\alpha=1}^\ell\varepsilon^{(k)}_{\alpha,i}.\]
For the signal propagation:
\[\begin{split}\partial_{X^{(k-1)}_{i,j}}\mathcal L &= \sum_{\alpha,\beta = (1,1)}^{(\ell, n^{(k)})}\varepsilon^{(k)}_{\alpha,\beta}\partial_{X^{(k-1)}_{i,j}}X^{(k)}_{\alpha,\beta}
= \sum_{\alpha,\beta = (1,1)}^{(\ell,n^{(k)})}\varepsilon^{(k)}_{\alpha,\beta}\sum_{\gamma=1}^{n^{(k-1)}}\delta_i^\alpha\delta_j^\gamma \\
&= \sum_{\beta=1}^{n^{(k)}}\varepsilon^{(k)}_{i,\beta}A^{(k)}_{\beta,j} = \Big(\varepsilon^{(k)}{A^{(k)}}'\Big)_{i,j}.\end{split}\]
Therefore \(\nabla_{X^{(k-1)}}\mathcal L = \varepsilon^{(k)}{A^{(k)}}'\).